* [PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues
[not found] <20070810063412.238042387@mail.ustc.edu.cn>
@ 2007-08-10 6:34 ` Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, Miklos Szeredi, linux-kernel, linux-fsdevel
Andrew,
I'd like to propose a cleaner way of using the s_dirty, s_io, s_more_io
queues for the writeback of dirty inodes. The basic idea is to clearly
define the function of the queues, especially to decouple s_diry from
s_io/s_more_io. The details are in the changelog of patch 2.
The patches are some cleanups on top of Andrew's s_dirty time-ordering
patches and Ken's s_more_io patch:
[PATCH 1/4] writeback: check time-ordering of s_io and s_more_io
[PATCH 2/4] writeback: 3-queue based writeback schedule
[PATCH 3/4] writeback: function renames and cleanups
[PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes()
fs/fs-writeback.c | 196 +++++++++++++++++++++++--------------------
fs/ntfs/super.c | 4
include/linux/fs.h | 1
3 files changed, 108 insertions(+), 93 deletions(-)
Note that the patches need rework for inserting into the right place of
your queue of -mm patches. I'll take care of it in the next take.
Thank you,
Fengguang
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues
[not found] <20070810063412.238042387@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues Fengguang Wu
@ 2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810063419.454829766@mail.ustc.edu.cn>
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, Miklos Szeredi, linux-kernel, linux-fsdevel
Andrew,
I'd like to propose a cleaner way of using the s_dirty, s_io, s_more_io
queues for the writeback of dirty inodes. The basic idea is to clearly
define the function of the queues, especially to decouple s_diry from
s_io/s_more_io. The details are in the changelog of patch 2.
The patches are some cleanups on top of Andrew's s_dirty time-ordering
patches and Ken's s_more_io patch:
[PATCH 1/4] writeback: check time-ordering of s_io and s_more_io
[PATCH 2/4] writeback: 3-queue based writeback schedule
[PATCH 3/4] writeback: function renames and cleanups
[PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes()
fs/fs-writeback.c | 196 +++++++++++++++++++++++--------------------
fs/ntfs/super.c | 4
include/linux/fs.h | 1
3 files changed, 108 insertions(+), 93 deletions(-)
Note that the patches need rework for inserting into the right place of
your queue of -mm patches. I'll take care of it in the next take.
Thank you,
Fengguang
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io
[not found] ` <20070810063419.454829766@mail.ustc.edu.cn>
@ 2007-08-10 6:34 ` Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
1 sibling, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel,
linux-fsdevel
[-- Attachment #1: io-dirty-check.patch --]
[-- Type: text/plain, Size: 3964 bytes --]
It helps catch bugs like this:
[ 738.645689] fs/fs-writeback.c:535: s_dirty got screwed up
[ 738.646114] ffff8100028532b0:4295082249
[ 738.646255] ffff810002856858:4295082259
[ 738.646388] ffff810002831b58:4295082667
[ 738.646520] ffff81000281b1b0:4295082671
[ 738.646651] ffff81000281d798:4295083507
========================================== s_dirty/s_io
[ 738.646783] ffff81000287e998:4295081393
[ 738.646916] ffff81000287e430:4295081403
[ 738.647068] ffff8100028789d8:4295081409
[ 738.647212] ffff810002878470:4295081415
[ 738.647358] ffff810002884a18:4295081421
[ 738.647503] ffff8100028844b0:4295081427
[ 738.647648] ffff810002890a58:4295081433
[ 738.647782] ffff8100028904f0:4295081441
[ 738.647894] ffff81000288da98:4295081449
[ 738.648011] ffff81000288d530:4295081455
[ 738.648123] ffff810002897ad8:4295081461
[ 738.648235] ffff810002897570:4295081469
[ 738.648347] ffff810002894b18:4295081477
[ 738.648459] ffff8100028945b0:4295081483
The buggy line 534 is
list_splice_init(&sb->s_io, sb->s_dirty.prev);
This is not the only time-ordering bug in linux-2.6.23-rc1-mm2.
Let's fix them all.
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c | 37 +++++++++++++++++++++++++++++++------
1 file changed, 31 insertions(+), 6 deletions(-)
--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -26,12 +26,12 @@
int sysctl_inode_debug __read_mostly;
-static int __check(struct super_block *sb, int print_stuff)
+static int __check(struct list_head *head, int print_stuff)
{
- struct list_head *cursor = &sb->s_dirty;
+ struct list_head *cursor = head;
unsigned long dirtied_when = 0;
- while ((cursor = cursor->prev) != &sb->s_dirty) {
+ while ((cursor = cursor->prev) != head) {
struct inode *inode = list_entry(cursor, struct inode, i_list);
if (print_stuff) {
printk("%p:%lu\n", inode, inode->dirtied_when);
@@ -51,14 +51,32 @@ static void __check_dirty_inode_list(str
if (!sysctl_inode_debug)
return;
- if (__check(sb, 0)) {
+ if (__check(&sb->s_dirty, 0)) {
sysctl_inode_debug = 0;
if (inode)
printk("%s:%d: s_dirty got screwed up. inode=%p:%lu\n",
file, line, inode, inode->dirtied_when);
else
printk("%s:%d: s_dirty got screwed up\n", file, line);
- __check(sb, 1);
+ __check(&sb->s_dirty, 1);
+ }
+ if (__check(&sb->s_io, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_io got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_io got screwed up\n", file, line);
+ __check(&sb->s_io, 1);
+ }
+ if (__check(&sb->s_more_io, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_more_io got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_more_io got screwed up\n", file, line);
+ __check(&sb->s_more_io, 1);
}
}
@@ -223,7 +241,9 @@ static void redirty_tail(struct inode *i
*/
static void requeue_io(struct inode *inode)
{
+ check_dirty_inode(inode);
list_move(&inode->i_list, &inode->i_sb->s_more_io);
+ check_dirty_inode(inode);
}
static void inode_sync_complete(struct inode *inode)
@@ -483,7 +503,9 @@ int generic_sync_sb_inodes(struct super_
/* Was this inode dirtied too recently? */
if (wbc->older_than_this && time_after(inode->dirtied_when,
*wbc->older_than_this)) {
+ check_dirty_inode_list(sb);
list_splice_init(&sb->s_io, sb->s_dirty.prev);
+ check_dirty_inode_list(sb);
break;
}
@@ -520,8 +542,11 @@ int generic_sync_sb_inodes(struct super_
break;
}
- if (list_empty(&sb->s_io))
+ if (list_empty(&sb->s_io)) {
+ check_dirty_inode_list(sb);
list_splice_init(&sb->s_more_io, &sb->s_io);
+ check_dirty_inode_list(sb);
+ }
spin_unlock(&inode_lock);
return ret; /* Leave any unwritten inodes on s_io */
}
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io
[not found] ` <20070810063419.454829766@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io Fengguang Wu
@ 2007-08-10 6:34 ` Fengguang Wu
1 sibling, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel,
linux-fsdevel
[-- Attachment #1: io-dirty-check.patch --]
[-- Type: text/plain, Size: 3964 bytes --]
It helps catch bugs like this:
[ 738.645689] fs/fs-writeback.c:535: s_dirty got screwed up
[ 738.646114] ffff8100028532b0:4295082249
[ 738.646255] ffff810002856858:4295082259
[ 738.646388] ffff810002831b58:4295082667
[ 738.646520] ffff81000281b1b0:4295082671
[ 738.646651] ffff81000281d798:4295083507
========================================== s_dirty/s_io
[ 738.646783] ffff81000287e998:4295081393
[ 738.646916] ffff81000287e430:4295081403
[ 738.647068] ffff8100028789d8:4295081409
[ 738.647212] ffff810002878470:4295081415
[ 738.647358] ffff810002884a18:4295081421
[ 738.647503] ffff8100028844b0:4295081427
[ 738.647648] ffff810002890a58:4295081433
[ 738.647782] ffff8100028904f0:4295081441
[ 738.647894] ffff81000288da98:4295081449
[ 738.648011] ffff81000288d530:4295081455
[ 738.648123] ffff810002897ad8:4295081461
[ 738.648235] ffff810002897570:4295081469
[ 738.648347] ffff810002894b18:4295081477
[ 738.648459] ffff8100028945b0:4295081483
The buggy line 534 is
list_splice_init(&sb->s_io, sb->s_dirty.prev);
This is not the only time-ordering bug in linux-2.6.23-rc1-mm2.
Let's fix them all.
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c | 37 +++++++++++++++++++++++++++++++------
1 file changed, 31 insertions(+), 6 deletions(-)
--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -26,12 +26,12 @@
int sysctl_inode_debug __read_mostly;
-static int __check(struct super_block *sb, int print_stuff)
+static int __check(struct list_head *head, int print_stuff)
{
- struct list_head *cursor = &sb->s_dirty;
+ struct list_head *cursor = head;
unsigned long dirtied_when = 0;
- while ((cursor = cursor->prev) != &sb->s_dirty) {
+ while ((cursor = cursor->prev) != head) {
struct inode *inode = list_entry(cursor, struct inode, i_list);
if (print_stuff) {
printk("%p:%lu\n", inode, inode->dirtied_when);
@@ -51,14 +51,32 @@ static void __check_dirty_inode_list(str
if (!sysctl_inode_debug)
return;
- if (__check(sb, 0)) {
+ if (__check(&sb->s_dirty, 0)) {
sysctl_inode_debug = 0;
if (inode)
printk("%s:%d: s_dirty got screwed up. inode=%p:%lu\n",
file, line, inode, inode->dirtied_when);
else
printk("%s:%d: s_dirty got screwed up\n", file, line);
- __check(sb, 1);
+ __check(&sb->s_dirty, 1);
+ }
+ if (__check(&sb->s_io, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_io got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_io got screwed up\n", file, line);
+ __check(&sb->s_io, 1);
+ }
+ if (__check(&sb->s_more_io, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_more_io got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_more_io got screwed up\n", file, line);
+ __check(&sb->s_more_io, 1);
}
}
@@ -223,7 +241,9 @@ static void redirty_tail(struct inode *i
*/
static void requeue_io(struct inode *inode)
{
+ check_dirty_inode(inode);
list_move(&inode->i_list, &inode->i_sb->s_more_io);
+ check_dirty_inode(inode);
}
static void inode_sync_complete(struct inode *inode)
@@ -483,7 +503,9 @@ int generic_sync_sb_inodes(struct super_
/* Was this inode dirtied too recently? */
if (wbc->older_than_this && time_after(inode->dirtied_when,
*wbc->older_than_this)) {
+ check_dirty_inode_list(sb);
list_splice_init(&sb->s_io, sb->s_dirty.prev);
+ check_dirty_inode_list(sb);
break;
}
@@ -520,8 +542,11 @@ int generic_sync_sb_inodes(struct super_
break;
}
- if (list_empty(&sb->s_io))
+ if (list_empty(&sb->s_io)) {
+ check_dirty_inode_list(sb);
list_splice_init(&sb->s_more_io, &sb->s_io);
+ check_dirty_inode_list(sb);
+ }
spin_unlock(&inode_lock);
return ret; /* Leave any unwritten inodes on s_io */
}
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/4] writeback: 3-queue based writeback schedule
[not found] ` <20070810063419.549052142@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 2/4] writeback: 3-queue based writeback schedule Fengguang Wu
@ 2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810164715.GA5508@mail.ustc.edu.cn>
2 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel,
linux-fsdevel
[-- Attachment #1: io-dirty-order-fix.patch --]
[-- Type: text/plain, Size: 9153 bytes --]
Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that
- time-ordering of dirtied_when can be easily maintained
- writeback can continue from where previous run left out
The majority work has been done by Andrew Morton and Ken Chen,
this patch just clarifies the roles of the 3 queues:
- s_dirty for io delay(up to dirty_expire_interval)
- s_io for io run(a full scan of s_io may involve multiple runs)
- s_more_io for io continuation
The following paradigm shows the data flow.
requeue on new scan(empty s_io)
+-----------------------------+
| |
dirty old | |
inodes enough V |
======> s_dirty ======> s_io |
^ | requeue io |
| +---------------------> s_more_io
| hold back |
+----------------+----------> disk write requests
sb->s_dirty: a FIFO queue
- s_dirty hosts not-yet-expired(recently dirtied) dirty inodes
- once expired, inodes will be moved out of s_dirty and *never put back*
(unless for some reason we have to hold on the inode for some time)
sb->s_io and sb->s_more_io: a cyclic queue scanned for io
- on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be
appended to s_io
- on each full scan of s_io, all s_more_io inodes will be moved back to s_io
- large files that cannot be synced in one run will be moved to s_more_io for
retry on next full scan
inode->dirtied_when
- inode->dirtied_when is updated to the *current* jiffies on pushing into
s_dirty, and is never changed in other cases.
- time-ordering thus can be simply ensured while moving inodes between lists,
since (time order == enqueue order)
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c | 106 +++++++++++++++++++++-----------------------
1 file changed, 52 insertions(+), 54 deletions(-)
--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -93,6 +93,15 @@ static void __check_dirty_inode_list(str
__FILE__, __LINE__); \
} while (0)
+
+int sb_has_dirty_inodes(struct super_block *sb)
+{
+ return !list_empty(&sb->s_dirty) ||
+ !list_empty(&sb->s_io) ||
+ !list_empty(&sb->s_more_io);
+}
+EXPORT_SYMBOL(sb_has_dirty_inodes);
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -187,7 +196,7 @@ void __mark_inode_dirty(struct inode *in
goto out;
/*
- * If the inode was already on s_dirty or s_io, don't
+ * If the inode was already on s_dirty/s_io/s_more_io, don't
* reposition it (that would break s_dirty time-ordering).
*/
if (!was_dirty) {
@@ -211,33 +220,20 @@ static int write_inode(struct inode *ino
}
/*
- * Redirty an inode: set its when-it-was dirtied timestamp and move it to the
- * furthest end of its superblock's dirty-inode list.
- *
- * Before stamping the inode's ->dirtied_when, we check to see whether it is
- * already the most-recently-dirtied inode on the s_dirty list. If that is
- * the case then the inode must have been redirtied while it was being written
- * out and we don't reset its dirtied_when.
+ * Enqueue a newly dirtied inode:
+ * - set its when-it-was dirtied timestamp
+ * - move it to the furthest end of its superblock's dirty-inode list
*/
static void redirty_tail(struct inode *inode)
{
- struct super_block *sb = inode->i_sb;
-
check_dirty_inode(inode);
- if (!list_empty(&sb->s_dirty)) {
- struct inode *tail_inode;
-
- tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
- if (!time_after_eq(inode->dirtied_when,
- tail_inode->dirtied_when))
- inode->dirtied_when = jiffies;
- }
- list_move(&inode->i_list, &sb->s_dirty);
+ inode->dirtied_when = jiffies;
+ list_move(&inode->i_list, &inode->i_sb->s_dirty);
check_dirty_inode(inode);
}
/*
- * requeue inode for re-scanning after sb->s_io list is exhausted.
+ * Queue an inode for more io in the next full scan of s_io.
*/
static void requeue_io(struct inode *inode)
{
@@ -246,6 +242,32 @@ static void requeue_io(struct inode *ino
check_dirty_inode(inode);
}
+/*
+ * Queue all possible inodes for a run of io.
+ * The resulting s_io is in order of:
+ * - inodes queued for more io from s_more_io(once for a full scan of s_io)
+ * - possible remaining inodes in s_io(was a partial scan)
+ * - dirty inodes (old enough) from s_dirty
+ */
+static void queue_inodes_for_io(struct super_block *sb,
+ unsigned long *older_than_this)
+{
+ check_dirty_inode_list(sb);
+ if (list_empty(&sb->s_io))
+ list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */
+ check_dirty_inode_list(sb);
+ while (!list_empty(&sb->s_dirty)) {
+ struct inode *inode = list_entry(sb->s_dirty.prev,
+ struct inode, i_list);
+ /* Was this inode dirtied too recently? */
+ if (older_than_this &&
+ time_after(inode->dirtied_when, *older_than_this))
+ break;
+ list_move(&inode->i_list, &sb->s_io);
+ }
+ check_dirty_inode_list(sb);
+}
+
static void inode_sync_complete(struct inode *inode)
{
/*
@@ -305,7 +327,7 @@ __sync_single_inode(struct inode *inode,
/*
* We didn't write back all the pages. nfs_writepages()
* sometimes bales out without doing anything. Redirty
- * the inode. It is moved from s_io onto s_dirty.
+ * the inode; Move it from s_io onto s_more_io/s_dirty.
*/
/*
* akpm: if the caller was the kupdate function we put
@@ -318,10 +340,9 @@ __sync_single_inode(struct inode *inode,
*/
if (wbc->for_kupdate) {
/*
- * For the kupdate function we leave the inode
- * at the head of sb_dirty so it will get more
- * writeout as soon as the queue becomes
- * uncongested.
+ * For the kupdate function we move the inode
+ * to s_more_io so it will get more writeout as
+ * soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
requeue_io(inode);
@@ -380,9 +401,9 @@ __writeback_single_inode(struct inode *i
/*
* We're skipping this inode because it's locked, and we're not
* doing writeback-for-data-integrity. Move it to the head of
- * s_dirty so that writeback can proceed with the other inodes
+ * s_more_io so that writeback can proceed with the other inodes
* on s_io. We'll have another go at writing back this inode
- * when the s_dirty iodes get moved back onto s_io.
+ * when we completed a full scan of s_io.
*/
requeue_io(inode);
@@ -444,16 +465,12 @@ __writeback_single_inode(struct inode *i
int generic_sync_sb_inodes(struct super_block *sb,
struct writeback_control *wbc)
{
- const unsigned long start = jiffies; /* livelock avoidance */
int ret = 0;
spin_lock(&inode_lock);
- if (!wbc->for_kupdate || list_empty(&sb->s_io)) {
- check_dirty_inode_list(sb);
- list_splice_init(&sb->s_dirty, &sb->s_io);
- check_dirty_inode_list(sb);
- }
+ if (!wbc->for_kupdate || list_empty(&sb->s_io))
+ queue_inodes_for_io(sb, wbc->older_than_this);
while (!list_empty(&sb->s_io)) {
int err;
@@ -496,19 +513,6 @@ int generic_sync_sb_inodes(struct super_
continue; /* blockdev has wrong queue */
}
- /* Was this inode dirtied after sync_sb_inodes was called? */
- if (time_after(inode->dirtied_when, start))
- break;
-
- /* Was this inode dirtied too recently? */
- if (wbc->older_than_this && time_after(inode->dirtied_when,
- *wbc->older_than_this)) {
- check_dirty_inode_list(sb);
- list_splice_init(&sb->s_io, sb->s_dirty.prev);
- check_dirty_inode_list(sb);
- break;
- }
-
/* Is another pdflush already flushing this queue? */
if (current_is_pdflush() && !writeback_acquire(bdi))
break;
@@ -541,12 +545,6 @@ int generic_sync_sb_inodes(struct super_
if (wbc->nr_to_write <= 0)
break;
}
-
- if (list_empty(&sb->s_io)) {
- check_dirty_inode_list(sb);
- list_splice_init(&sb->s_more_io, &sb->s_io);
- check_dirty_inode_list(sb);
- }
spin_unlock(&inode_lock);
return ret; /* Leave any unwritten inodes on s_io */
}
@@ -566,7 +564,7 @@ static int sync_sb_inodes(struct super_b
* Note:
* We don't need to grab a reference to superblock here. If it has non-empty
* ->s_dirty it's hadn't been killed yet and kill_super() won't proceed
- * past sync_inodes_sb() until both the ->s_dirty and ->s_io lists are
+ * past sync_inodes_sb() until the ->s_dirty/s_io/s_more_io lists are all
* empty. Since __sync_single_inode() regains inode_lock before it finally moves
* inode from superblock lists we are OK.
*
@@ -589,7 +587,7 @@ int writeback_inodes(struct writeback_co
restart:
sb = sb_entry(super_blocks.prev);
for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.prev)) {
- if (!list_empty(&sb->s_dirty) || !list_empty(&sb->s_io)) {
+ if (sb_has_dirty_inodes(sb)) {
/* we're making our own get_super here */
sb->s_count++;
spin_unlock(&sb_lock);
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/4] writeback: 3-queue based writeback schedule
[not found] ` <20070810063419.549052142@mail.ustc.edu.cn>
@ 2007-08-10 6:34 ` Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810164715.GA5508@mail.ustc.edu.cn>
2 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel,
linux-fsdevel
[-- Attachment #1: io-dirty-order-fix.patch --]
[-- Type: text/plain, Size: 9153 bytes --]
Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that
- time-ordering of dirtied_when can be easily maintained
- writeback can continue from where previous run left out
The majority work has been done by Andrew Morton and Ken Chen,
this patch just clarifies the roles of the 3 queues:
- s_dirty for io delay(up to dirty_expire_interval)
- s_io for io run(a full scan of s_io may involve multiple runs)
- s_more_io for io continuation
The following paradigm shows the data flow.
requeue on new scan(empty s_io)
+-----------------------------+
| |
dirty old | |
inodes enough V |
======> s_dirty ======> s_io |
^ | requeue io |
| +---------------------> s_more_io
| hold back |
+----------------+----------> disk write requests
sb->s_dirty: a FIFO queue
- s_dirty hosts not-yet-expired(recently dirtied) dirty inodes
- once expired, inodes will be moved out of s_dirty and *never put back*
(unless for some reason we have to hold on the inode for some time)
sb->s_io and sb->s_more_io: a cyclic queue scanned for io
- on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be
appended to s_io
- on each full scan of s_io, all s_more_io inodes will be moved back to s_io
- large files that cannot be synced in one run will be moved to s_more_io for
retry on next full scan
inode->dirtied_when
- inode->dirtied_when is updated to the *current* jiffies on pushing into
s_dirty, and is never changed in other cases.
- time-ordering thus can be simply ensured while moving inodes between lists,
since (time order == enqueue order)
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c | 106 +++++++++++++++++++++-----------------------
1 file changed, 52 insertions(+), 54 deletions(-)
--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -93,6 +93,15 @@ static void __check_dirty_inode_list(str
__FILE__, __LINE__); \
} while (0)
+
+int sb_has_dirty_inodes(struct super_block *sb)
+{
+ return !list_empty(&sb->s_dirty) ||
+ !list_empty(&sb->s_io) ||
+ !list_empty(&sb->s_more_io);
+}
+EXPORT_SYMBOL(sb_has_dirty_inodes);
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -187,7 +196,7 @@ void __mark_inode_dirty(struct inode *in
goto out;
/*
- * If the inode was already on s_dirty or s_io, don't
+ * If the inode was already on s_dirty/s_io/s_more_io, don't
* reposition it (that would break s_dirty time-ordering).
*/
if (!was_dirty) {
@@ -211,33 +220,20 @@ static int write_inode(struct inode *ino
}
/*
- * Redirty an inode: set its when-it-was dirtied timestamp and move it to the
- * furthest end of its superblock's dirty-inode list.
- *
- * Before stamping the inode's ->dirtied_when, we check to see whether it is
- * already the most-recently-dirtied inode on the s_dirty list. If that is
- * the case then the inode must have been redirtied while it was being written
- * out and we don't reset its dirtied_when.
+ * Enqueue a newly dirtied inode:
+ * - set its when-it-was dirtied timestamp
+ * - move it to the furthest end of its superblock's dirty-inode list
*/
static void redirty_tail(struct inode *inode)
{
- struct super_block *sb = inode->i_sb;
-
check_dirty_inode(inode);
- if (!list_empty(&sb->s_dirty)) {
- struct inode *tail_inode;
-
- tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
- if (!time_after_eq(inode->dirtied_when,
- tail_inode->dirtied_when))
- inode->dirtied_when = jiffies;
- }
- list_move(&inode->i_list, &sb->s_dirty);
+ inode->dirtied_when = jiffies;
+ list_move(&inode->i_list, &inode->i_sb->s_dirty);
check_dirty_inode(inode);
}
/*
- * requeue inode for re-scanning after sb->s_io list is exhausted.
+ * Queue an inode for more io in the next full scan of s_io.
*/
static void requeue_io(struct inode *inode)
{
@@ -246,6 +242,32 @@ static void requeue_io(struct inode *ino
check_dirty_inode(inode);
}
+/*
+ * Queue all possible inodes for a run of io.
+ * The resulting s_io is in order of:
+ * - inodes queued for more io from s_more_io(once for a full scan of s_io)
+ * - possible remaining inodes in s_io(was a partial scan)
+ * - dirty inodes (old enough) from s_dirty
+ */
+static void queue_inodes_for_io(struct super_block *sb,
+ unsigned long *older_than_this)
+{
+ check_dirty_inode_list(sb);
+ if (list_empty(&sb->s_io))
+ list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */
+ check_dirty_inode_list(sb);
+ while (!list_empty(&sb->s_dirty)) {
+ struct inode *inode = list_entry(sb->s_dirty.prev,
+ struct inode, i_list);
+ /* Was this inode dirtied too recently? */
+ if (older_than_this &&
+ time_after(inode->dirtied_when, *older_than_this))
+ break;
+ list_move(&inode->i_list, &sb->s_io);
+ }
+ check_dirty_inode_list(sb);
+}
+
static void inode_sync_complete(struct inode *inode)
{
/*
@@ -305,7 +327,7 @@ __sync_single_inode(struct inode *inode,
/*
* We didn't write back all the pages. nfs_writepages()
* sometimes bales out without doing anything. Redirty
- * the inode. It is moved from s_io onto s_dirty.
+ * the inode; Move it from s_io onto s_more_io/s_dirty.
*/
/*
* akpm: if the caller was the kupdate function we put
@@ -318,10 +340,9 @@ __sync_single_inode(struct inode *inode,
*/
if (wbc->for_kupdate) {
/*
- * For the kupdate function we leave the inode
- * at the head of sb_dirty so it will get more
- * writeout as soon as the queue becomes
- * uncongested.
+ * For the kupdate function we move the inode
+ * to s_more_io so it will get more writeout as
+ * soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
requeue_io(inode);
@@ -380,9 +401,9 @@ __writeback_single_inode(struct inode *i
/*
* We're skipping this inode because it's locked, and we're not
* doing writeback-for-data-integrity. Move it to the head of
- * s_dirty so that writeback can proceed with the other inodes
+ * s_more_io so that writeback can proceed with the other inodes
* on s_io. We'll have another go at writing back this inode
- * when the s_dirty iodes get moved back onto s_io.
+ * when we completed a full scan of s_io.
*/
requeue_io(inode);
@@ -444,16 +465,12 @@ __writeback_single_inode(struct inode *i
int generic_sync_sb_inodes(struct super_block *sb,
struct writeback_control *wbc)
{
- const unsigned long start = jiffies; /* livelock avoidance */
int ret = 0;
spin_lock(&inode_lock);
- if (!wbc->for_kupdate || list_empty(&sb->s_io)) {
- check_dirty_inode_list(sb);
- list_splice_init(&sb->s_dirty, &sb->s_io);
- check_dirty_inode_list(sb);
- }
+ if (!wbc->for_kupdate || list_empty(&sb->s_io))
+ queue_inodes_for_io(sb, wbc->older_than_this);
while (!list_empty(&sb->s_io)) {
int err;
@@ -496,19 +513,6 @@ int generic_sync_sb_inodes(struct super_
continue; /* blockdev has wrong queue */
}
- /* Was this inode dirtied after sync_sb_inodes was called? */
- if (time_after(inode->dirtied_when, start))
- break;
-
- /* Was this inode dirtied too recently? */
- if (wbc->older_than_this && time_after(inode->dirtied_when,
- *wbc->older_than_this)) {
- check_dirty_inode_list(sb);
- list_splice_init(&sb->s_io, sb->s_dirty.prev);
- check_dirty_inode_list(sb);
- break;
- }
-
/* Is another pdflush already flushing this queue? */
if (current_is_pdflush() && !writeback_acquire(bdi))
break;
@@ -541,12 +545,6 @@ int generic_sync_sb_inodes(struct super_
if (wbc->nr_to_write <= 0)
break;
}
-
- if (list_empty(&sb->s_io)) {
- check_dirty_inode_list(sb);
- list_splice_init(&sb->s_more_io, &sb->s_io);
- check_dirty_inode_list(sb);
- }
spin_unlock(&inode_lock);
return ret; /* Leave any unwritten inodes on s_io */
}
@@ -566,7 +564,7 @@ static int sync_sb_inodes(struct super_b
* Note:
* We don't need to grab a reference to superblock here. If it has non-empty
* ->s_dirty it's hadn't been killed yet and kill_super() won't proceed
- * past sync_inodes_sb() until both the ->s_dirty and ->s_io lists are
+ * past sync_inodes_sb() until the ->s_dirty/s_io/s_more_io lists are all
* empty. Since __sync_single_inode() regains inode_lock before it finally moves
* inode from superblock lists we are OK.
*
@@ -589,7 +587,7 @@ int writeback_inodes(struct writeback_co
restart:
sb = sb_entry(super_blocks.prev);
for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.prev)) {
- if (!list_empty(&sb->s_dirty) || !list_empty(&sb->s_io)) {
+ if (sb_has_dirty_inodes(sb)) {
/* we're making our own get_super here */
sb->s_count++;
spin_unlock(&sb_lock);
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/4] writeback: function renames and cleanups
[not found] ` <20070810063419.657077419@mail.ustc.edu.cn>
@ 2007-08-10 6:34 ` Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
1 sibling, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel,
linux-fsdevel
[-- Attachment #1: fs-writeback-cleanup.patch --]
[-- Type: text/plain, Size: 7280 bytes --]
Two function renames:
- rename redirty_tail() to queue_dirty_inode()
- rename requeue_io() to queue_for_more_io()
Also some code cleanups on fs-writeback.c. No behavior changes.
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c | 133 ++++++++++++++++++++------------------------
1 file changed, 62 insertions(+), 71 deletions(-)
--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -102,6 +102,55 @@ int sb_has_dirty_inodes(struct super_blo
}
EXPORT_SYMBOL(sb_has_dirty_inodes);
+/*
+ * Enqueue a newly dirtied inode:
+ * - set its when-it-was dirtied timestamp
+ * - move it to the furthest end of its superblock's dirty-inode list
+ */
+static void queue_dirty_inode(struct inode *inode)
+{
+ check_dirty_inode(inode);
+ inode->dirtied_when = jiffies;
+ list_move(&inode->i_list, &inode->i_sb->s_dirty);
+ check_dirty_inode(inode);
+}
+
+/*
+ * Queue an inode for more io in the next full scan of s_io.
+ */
+static void queue_for_more_io(struct inode *inode)
+{
+ check_dirty_inode(inode);
+ list_move(&inode->i_list, &inode->i_sb->s_more_io);
+ check_dirty_inode(inode);
+}
+
+/*
+ * Queue all possible inodes for a run of io.
+ * The resulting s_io is in order of:
+ * - inodes queued for more io from s_more_io(once for a full scan of s_io)
+ * - possible remaining inodes in s_io(was a partial scan)
+ * - dirty inodes (old enough) from s_dirty
+ */
+static void queue_inodes_for_io(struct super_block *sb,
+ unsigned long *older_than_this)
+{
+ check_dirty_inode_list(sb);
+ if (list_empty(&sb->s_io))
+ list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */
+ check_dirty_inode_list(sb);
+ while (!list_empty(&sb->s_dirty)) {
+ struct inode *inode = list_entry(sb->s_dirty.prev,
+ struct inode, i_list);
+ /* Was this inode dirtied too recently? */
+ if (older_than_this &&
+ time_after(inode->dirtied_when, *older_than_this))
+ break;
+ list_move(&inode->i_list, &sb->s_io);
+ }
+ check_dirty_inode_list(sb);
+}
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -199,12 +248,8 @@ void __mark_inode_dirty(struct inode *in
* If the inode was already on s_dirty/s_io/s_more_io, don't
* reposition it (that would break s_dirty time-ordering).
*/
- if (!was_dirty) {
- check_dirty_inode(inode);
- inode->dirtied_when = jiffies;
- list_move(&inode->i_list, &sb->s_dirty);
- check_dirty_inode(inode);
- }
+ if (!was_dirty)
+ queue_dirty_inode(inode);
}
out:
spin_unlock(&inode_lock);
@@ -219,55 +264,6 @@ static int write_inode(struct inode *ino
return 0;
}
-/*
- * Enqueue a newly dirtied inode:
- * - set its when-it-was dirtied timestamp
- * - move it to the furthest end of its superblock's dirty-inode list
- */
-static void redirty_tail(struct inode *inode)
-{
- check_dirty_inode(inode);
- inode->dirtied_when = jiffies;
- list_move(&inode->i_list, &inode->i_sb->s_dirty);
- check_dirty_inode(inode);
-}
-
-/*
- * Queue an inode for more io in the next full scan of s_io.
- */
-static void requeue_io(struct inode *inode)
-{
- check_dirty_inode(inode);
- list_move(&inode->i_list, &inode->i_sb->s_more_io);
- check_dirty_inode(inode);
-}
-
-/*
- * Queue all possible inodes for a run of io.
- * The resulting s_io is in order of:
- * - inodes queued for more io from s_more_io(once for a full scan of s_io)
- * - possible remaining inodes in s_io(was a partial scan)
- * - dirty inodes (old enough) from s_dirty
- */
-static void queue_inodes_for_io(struct super_block *sb,
- unsigned long *older_than_this)
-{
- check_dirty_inode_list(sb);
- if (list_empty(&sb->s_io))
- list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */
- check_dirty_inode_list(sb);
- while (!list_empty(&sb->s_dirty)) {
- struct inode *inode = list_entry(sb->s_dirty.prev,
- struct inode, i_list);
- /* Was this inode dirtied too recently? */
- if (older_than_this &&
- time_after(inode->dirtied_when, *older_than_this))
- break;
- list_move(&inode->i_list, &sb->s_io);
- }
- check_dirty_inode_list(sb);
-}
-
static void inode_sync_complete(struct inode *inode)
{
/*
@@ -329,6 +325,7 @@ __sync_single_inode(struct inode *inode,
* sometimes bales out without doing anything. Redirty
* the inode; Move it from s_io onto s_more_io/s_dirty.
*/
+ inode->i_state |= I_DIRTY_PAGES;
/*
* akpm: if the caller was the kupdate function we put
* this inode at the head of s_dirty so it gets first
@@ -344,8 +341,7 @@ __sync_single_inode(struct inode *inode,
* to s_more_io so it will get more writeout as
* soon as the queue becomes uncongested.
*/
- inode->i_state |= I_DIRTY_PAGES;
- requeue_io(inode);
+ queue_for_more_io(inode);
} else {
/*
* Otherwise fully redirty the inode so that
@@ -354,15 +350,14 @@ __sync_single_inode(struct inode *inode,
* file would indefinitely suspend writeout of
* all the other files.
*/
- inode->i_state |= I_DIRTY_PAGES;
- redirty_tail(inode);
+ queue_dirty_inode(inode);
}
} else if (inode->i_state & I_DIRTY) {
/*
* Someone redirtied the inode while were writing back
* the pages.
*/
- redirty_tail(inode);
+ queue_dirty_inode(inode);
} else if (atomic_read(&inode->i_count)) {
/*
* The inode is clean, inuse
@@ -405,7 +400,7 @@ __writeback_single_inode(struct inode *i
* on s_io. We'll have another go at writing back this inode
* when we completed a full scan of s_io.
*/
- requeue_io(inode);
+ queue_for_more_io(inode);
/*
* Even if we don't actually write the inode itself here,
@@ -482,7 +477,7 @@ int generic_sync_sb_inodes(struct super_
long pages_skipped;
if (!bdi_cap_writeback_dirty(bdi)) {
- redirty_tail(inode);
+ queue_dirty_inode(inode);
if (sb_is_blkdev_sb(sb)) {
/*
* Dirty memory-backed blockdev: the ramdisk
@@ -502,14 +497,14 @@ int generic_sync_sb_inodes(struct super_
wbc->encountered_congestion = 1;
if (!sb_is_blkdev_sb(sb))
break; /* Skip a congested fs */
- requeue_io(inode);
+ queue_for_more_io(inode);
continue; /* Skip a congested blockdev */
}
if (wbc->bdi && bdi != wbc->bdi) {
if (!sb_is_blkdev_sb(sb))
break; /* fs has the wrong queue */
- requeue_io(inode);
+ queue_for_more_io(inode);
continue; /* blockdev has wrong queue */
}
@@ -523,12 +518,8 @@ int generic_sync_sb_inodes(struct super_
err = __writeback_single_inode(inode, wbc);
if (!ret)
ret = err;
- if (wbc->sync_mode == WB_SYNC_HOLD) {
- check_dirty_inode(inode);
- inode->dirtied_when = jiffies;
- list_move(&inode->i_list, &sb->s_dirty);
- check_dirty_inode(inode);
- }
+ if (wbc->sync_mode == WB_SYNC_HOLD)
+ queue_dirty_inode(inode);
if (current_is_pdflush())
writeback_release(bdi);
if (wbc->pages_skipped != pages_skipped) {
@@ -536,7 +527,7 @@ int generic_sync_sb_inodes(struct super_
* writeback is not making progress due to locked
* buffers. Skip this inode for now.
*/
- redirty_tail(inode);
+ queue_dirty_inode(inode);
}
spin_unlock(&inode_lock);
iput(inode);
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/4] writeback: function renames and cleanups
[not found] ` <20070810063419.657077419@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 3/4] writeback: function renames and cleanups Fengguang Wu
@ 2007-08-10 6:34 ` Fengguang Wu
1 sibling, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel,
linux-fsdevel
[-- Attachment #1: fs-writeback-cleanup.patch --]
[-- Type: text/plain, Size: 7280 bytes --]
Two function renames:
- rename redirty_tail() to queue_dirty_inode()
- rename requeue_io() to queue_for_more_io()
Also some code cleanups on fs-writeback.c. No behavior changes.
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c | 133 ++++++++++++++++++++------------------------
1 file changed, 62 insertions(+), 71 deletions(-)
--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -102,6 +102,55 @@ int sb_has_dirty_inodes(struct super_blo
}
EXPORT_SYMBOL(sb_has_dirty_inodes);
+/*
+ * Enqueue a newly dirtied inode:
+ * - set its when-it-was dirtied timestamp
+ * - move it to the furthest end of its superblock's dirty-inode list
+ */
+static void queue_dirty_inode(struct inode *inode)
+{
+ check_dirty_inode(inode);
+ inode->dirtied_when = jiffies;
+ list_move(&inode->i_list, &inode->i_sb->s_dirty);
+ check_dirty_inode(inode);
+}
+
+/*
+ * Queue an inode for more io in the next full scan of s_io.
+ */
+static void queue_for_more_io(struct inode *inode)
+{
+ check_dirty_inode(inode);
+ list_move(&inode->i_list, &inode->i_sb->s_more_io);
+ check_dirty_inode(inode);
+}
+
+/*
+ * Queue all possible inodes for a run of io.
+ * The resulting s_io is in order of:
+ * - inodes queued for more io from s_more_io(once for a full scan of s_io)
+ * - possible remaining inodes in s_io(was a partial scan)
+ * - dirty inodes (old enough) from s_dirty
+ */
+static void queue_inodes_for_io(struct super_block *sb,
+ unsigned long *older_than_this)
+{
+ check_dirty_inode_list(sb);
+ if (list_empty(&sb->s_io))
+ list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */
+ check_dirty_inode_list(sb);
+ while (!list_empty(&sb->s_dirty)) {
+ struct inode *inode = list_entry(sb->s_dirty.prev,
+ struct inode, i_list);
+ /* Was this inode dirtied too recently? */
+ if (older_than_this &&
+ time_after(inode->dirtied_when, *older_than_this))
+ break;
+ list_move(&inode->i_list, &sb->s_io);
+ }
+ check_dirty_inode_list(sb);
+}
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -199,12 +248,8 @@ void __mark_inode_dirty(struct inode *in
* If the inode was already on s_dirty/s_io/s_more_io, don't
* reposition it (that would break s_dirty time-ordering).
*/
- if (!was_dirty) {
- check_dirty_inode(inode);
- inode->dirtied_when = jiffies;
- list_move(&inode->i_list, &sb->s_dirty);
- check_dirty_inode(inode);
- }
+ if (!was_dirty)
+ queue_dirty_inode(inode);
}
out:
spin_unlock(&inode_lock);
@@ -219,55 +264,6 @@ static int write_inode(struct inode *ino
return 0;
}
-/*
- * Enqueue a newly dirtied inode:
- * - set its when-it-was dirtied timestamp
- * - move it to the furthest end of its superblock's dirty-inode list
- */
-static void redirty_tail(struct inode *inode)
-{
- check_dirty_inode(inode);
- inode->dirtied_when = jiffies;
- list_move(&inode->i_list, &inode->i_sb->s_dirty);
- check_dirty_inode(inode);
-}
-
-/*
- * Queue an inode for more io in the next full scan of s_io.
- */
-static void requeue_io(struct inode *inode)
-{
- check_dirty_inode(inode);
- list_move(&inode->i_list, &inode->i_sb->s_more_io);
- check_dirty_inode(inode);
-}
-
-/*
- * Queue all possible inodes for a run of io.
- * The resulting s_io is in order of:
- * - inodes queued for more io from s_more_io(once for a full scan of s_io)
- * - possible remaining inodes in s_io(was a partial scan)
- * - dirty inodes (old enough) from s_dirty
- */
-static void queue_inodes_for_io(struct super_block *sb,
- unsigned long *older_than_this)
-{
- check_dirty_inode_list(sb);
- if (list_empty(&sb->s_io))
- list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */
- check_dirty_inode_list(sb);
- while (!list_empty(&sb->s_dirty)) {
- struct inode *inode = list_entry(sb->s_dirty.prev,
- struct inode, i_list);
- /* Was this inode dirtied too recently? */
- if (older_than_this &&
- time_after(inode->dirtied_when, *older_than_this))
- break;
- list_move(&inode->i_list, &sb->s_io);
- }
- check_dirty_inode_list(sb);
-}
-
static void inode_sync_complete(struct inode *inode)
{
/*
@@ -329,6 +325,7 @@ __sync_single_inode(struct inode *inode,
* sometimes bales out without doing anything. Redirty
* the inode; Move it from s_io onto s_more_io/s_dirty.
*/
+ inode->i_state |= I_DIRTY_PAGES;
/*
* akpm: if the caller was the kupdate function we put
* this inode at the head of s_dirty so it gets first
@@ -344,8 +341,7 @@ __sync_single_inode(struct inode *inode,
* to s_more_io so it will get more writeout as
* soon as the queue becomes uncongested.
*/
- inode->i_state |= I_DIRTY_PAGES;
- requeue_io(inode);
+ queue_for_more_io(inode);
} else {
/*
* Otherwise fully redirty the inode so that
@@ -354,15 +350,14 @@ __sync_single_inode(struct inode *inode,
* file would indefinitely suspend writeout of
* all the other files.
*/
- inode->i_state |= I_DIRTY_PAGES;
- redirty_tail(inode);
+ queue_dirty_inode(inode);
}
} else if (inode->i_state & I_DIRTY) {
/*
* Someone redirtied the inode while were writing back
* the pages.
*/
- redirty_tail(inode);
+ queue_dirty_inode(inode);
} else if (atomic_read(&inode->i_count)) {
/*
* The inode is clean, inuse
@@ -405,7 +400,7 @@ __writeback_single_inode(struct inode *i
* on s_io. We'll have another go at writing back this inode
* when we completed a full scan of s_io.
*/
- requeue_io(inode);
+ queue_for_more_io(inode);
/*
* Even if we don't actually write the inode itself here,
@@ -482,7 +477,7 @@ int generic_sync_sb_inodes(struct super_
long pages_skipped;
if (!bdi_cap_writeback_dirty(bdi)) {
- redirty_tail(inode);
+ queue_dirty_inode(inode);
if (sb_is_blkdev_sb(sb)) {
/*
* Dirty memory-backed blockdev: the ramdisk
@@ -502,14 +497,14 @@ int generic_sync_sb_inodes(struct super_
wbc->encountered_congestion = 1;
if (!sb_is_blkdev_sb(sb))
break; /* Skip a congested fs */
- requeue_io(inode);
+ queue_for_more_io(inode);
continue; /* Skip a congested blockdev */
}
if (wbc->bdi && bdi != wbc->bdi) {
if (!sb_is_blkdev_sb(sb))
break; /* fs has the wrong queue */
- requeue_io(inode);
+ queue_for_more_io(inode);
continue; /* blockdev has wrong queue */
}
@@ -523,12 +518,8 @@ int generic_sync_sb_inodes(struct super_
err = __writeback_single_inode(inode, wbc);
if (!ret)
ret = err;
- if (wbc->sync_mode == WB_SYNC_HOLD) {
- check_dirty_inode(inode);
- inode->dirtied_when = jiffies;
- list_move(&inode->i_list, &sb->s_dirty);
- check_dirty_inode(inode);
- }
+ if (wbc->sync_mode == WB_SYNC_HOLD)
+ queue_dirty_inode(inode);
if (current_is_pdflush())
writeback_release(bdi);
if (wbc->pages_skipped != pages_skipped) {
@@ -536,7 +527,7 @@ int generic_sync_sb_inodes(struct super_
* writeback is not making progress due to locked
* buffers. Skip this inode for now.
*/
- redirty_tail(inode);
+ queue_dirty_inode(inode);
}
spin_unlock(&inode_lock);
iput(inode);
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes()
[not found] ` <20070810063419.786586676@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
@ 2007-08-10 6:34 ` Fengguang Wu
1 sibling, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Anton Altaparmakov, Andrew Morton, Miklos Szeredi,
linux-kernel, linux-fsdevel
[-- Attachment #1: nfs-dirty-inodes.patch --]
[-- Type: text/plain, Size: 1453 bytes --]
NTFS's if-condition on dirty inodes is not complete.
Fix it with sb_has_dirty_inodes().
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
--- linux-2.6.23-rc1-mm2.orig/fs/ntfs/super.c
+++ linux-2.6.23-rc1-mm2/fs/ntfs/super.c
@@ -2381,14 +2381,14 @@ static void ntfs_put_super(struct super_
*/
ntfs_commit_inode(vol->mft_ino);
write_inode_now(vol->mft_ino, 1);
- if (!list_empty(&sb->s_dirty)) {
+ if (sb_has_dirty_inodes(sb)) {
const char *s1, *s2;
mutex_lock(&vol->mft_ino->i_mutex);
truncate_inode_pages(vol->mft_ino->i_mapping, 0);
mutex_unlock(&vol->mft_ino->i_mutex);
write_inode_now(vol->mft_ino, 1);
- if (!list_empty(&sb->s_dirty)) {
+ if (sb_has_dirty_inodes(sb)) {
static const char *_s1 = "inodes";
static const char *_s2 = "";
s1 = _s1;
--- linux-2.6.23-rc1-mm2.orig/include/linux/fs.h
+++ linux-2.6.23-rc1-mm2/include/linux/fs.h
@@ -1772,6 +1772,7 @@ extern int bdev_read_only(struct block_d
extern int set_blocksize(struct block_device *, int);
extern int sb_set_blocksize(struct super_block *, int);
extern int sb_min_blocksize(struct super_block *, int);
+extern int sb_has_dirty_inodes(struct super_block *);
extern int generic_file_mmap(struct file *, struct vm_area_struct *);
extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes()
[not found] ` <20070810063419.786586676@mail.ustc.edu.cn>
@ 2007-08-10 6:34 ` Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
1 sibling, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Anton Altaparmakov, Andrew Morton, Miklos Szeredi,
linux-kernel, linux-fsdevel
[-- Attachment #1: nfs-dirty-inodes.patch --]
[-- Type: text/plain, Size: 1453 bytes --]
NTFS's if-condition on dirty inodes is not complete.
Fix it with sb_has_dirty_inodes().
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
--- linux-2.6.23-rc1-mm2.orig/fs/ntfs/super.c
+++ linux-2.6.23-rc1-mm2/fs/ntfs/super.c
@@ -2381,14 +2381,14 @@ static void ntfs_put_super(struct super_
*/
ntfs_commit_inode(vol->mft_ino);
write_inode_now(vol->mft_ino, 1);
- if (!list_empty(&sb->s_dirty)) {
+ if (sb_has_dirty_inodes(sb)) {
const char *s1, *s2;
mutex_lock(&vol->mft_ino->i_mutex);
truncate_inode_pages(vol->mft_ino->i_mapping, 0);
mutex_unlock(&vol->mft_ino->i_mutex);
write_inode_now(vol->mft_ino, 1);
- if (!list_empty(&sb->s_dirty)) {
+ if (sb_has_dirty_inodes(sb)) {
static const char *_s1 = "inodes";
static const char *_s2 = "";
s1 = _s1;
--- linux-2.6.23-rc1-mm2.orig/include/linux/fs.h
+++ linux-2.6.23-rc1-mm2/include/linux/fs.h
@@ -1772,6 +1772,7 @@ extern int bdev_read_only(struct block_d
extern int set_blocksize(struct block_device *, int);
extern int sb_set_blocksize(struct super_block *, int);
extern int sb_min_blocksize(struct super_block *, int);
+extern int sb_has_dirty_inodes(struct super_block *);
extern int generic_file_mmap(struct file *, struct vm_area_struct *);
extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] writeback: 3-queue based writeback schedule
[not found] ` <20070810164715.GA5508@mail.ustc.edu.cn>
@ 2007-08-10 16:47 ` Fengguang Wu
2007-08-10 16:47 ` Fengguang Wu
[not found] ` <20070810170534.GA5137@mail.ustc.edu.cn>
2 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 16:47 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel,
linux-fsdevel
On Fri, Aug 10, 2007 at 02:34:14PM +0800, Fengguang Wu wrote:
> Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that
> - time-ordering of dirtied_when can be easily maintained
> - writeback can continue from where previous run left out
>
> The majority work has been done by Andrew Morton and Ken Chen,
> this patch just clarifies the roles of the 3 queues:
> - s_dirty for io delay(up to dirty_expire_interval)
> - s_io for io run(a full scan of s_io may involve multiple runs)
> - s_more_io for io continuation
>
> The following paradigm shows the data flow.
>
> requeue on new scan(empty s_io)
> +-----------------------------+
> | |
> dirty old | |
> inodes enough V |
> ======> s_dirty ======> s_io |
> ^ | requeue io |
> | +---------------------> s_more_io
> | hold back |
> +----------------+----------> disk write requests
>
> sb->s_dirty: a FIFO queue
> - s_dirty hosts not-yet-expired(recently dirtied) dirty inodes
> - once expired, inodes will be moved out of s_dirty and *never put back*
> (unless for some reason we have to hold on the inode for some time)
>
> sb->s_io and sb->s_more_io: a cyclic queue scanned for io
> - on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be
> appended to s_io
> - on each full scan of s_io, all s_more_io inodes will be moved back to s_io
> - large files that cannot be synced in one run will be moved to s_more_io for
> retry on next full scan
In fact s_more_io is no longer necessary. We end up with a priority
io-delaying queue s_dirty and a cyclic io-syncing queue s_io. They are
properly decoupled. More flexible data structure can be used for
s_dirty, if we want to redirty an inode with arbitrary delays. Also
more priority queues can be introduced in addition to s_dirty. For
example, we can designate a queue s_dirty_atime for inodes dirtied
only by `atime', and sync them lazily.
> inode->dirtied_when
> - inode->dirtied_when is updated to the *current* jiffies on pushing into
> s_dirty, and is never changed in other cases.
> - time-ordering thus can be simply ensured while moving inodes between lists,
> since (time order == enqueue order)
>
> Cc: Ken Chen <kenchen@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> ---
> fs/fs-writeback.c | 106 +++++++++++++++++++++-----------------------
> 1 file changed, 52 insertions(+), 54 deletions(-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] writeback: 3-queue based writeback schedule
[not found] ` <20070810164715.GA5508@mail.ustc.edu.cn>
2007-08-10 16:47 ` Fengguang Wu
@ 2007-08-10 16:47 ` Fengguang Wu
[not found] ` <20070810170534.GA5137@mail.ustc.edu.cn>
2 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 16:47 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel,
linux-fsdevel
On Fri, Aug 10, 2007 at 02:34:14PM +0800, Fengguang Wu wrote:
> Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that
> - time-ordering of dirtied_when can be easily maintained
> - writeback can continue from where previous run left out
>
> The majority work has been done by Andrew Morton and Ken Chen,
> this patch just clarifies the roles of the 3 queues:
> - s_dirty for io delay(up to dirty_expire_interval)
> - s_io for io run(a full scan of s_io may involve multiple runs)
> - s_more_io for io continuation
>
> The following paradigm shows the data flow.
>
> requeue on new scan(empty s_io)
> +-----------------------------+
> | |
> dirty old | |
> inodes enough V |
> ======> s_dirty ======> s_io |
> ^ | requeue io |
> | +---------------------> s_more_io
> | hold back |
> +----------------+----------> disk write requests
>
> sb->s_dirty: a FIFO queue
> - s_dirty hosts not-yet-expired(recently dirtied) dirty inodes
> - once expired, inodes will be moved out of s_dirty and *never put back*
> (unless for some reason we have to hold on the inode for some time)
>
> sb->s_io and sb->s_more_io: a cyclic queue scanned for io
> - on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be
> appended to s_io
> - on each full scan of s_io, all s_more_io inodes will be moved back to s_io
> - large files that cannot be synced in one run will be moved to s_more_io for
> retry on next full scan
In fact s_more_io is no longer necessary. We end up with a priority
io-delaying queue s_dirty and a cyclic io-syncing queue s_io. They are
properly decoupled. More flexible data structure can be used for
s_dirty, if we want to redirty an inode with arbitrary delays. Also
more priority queues can be introduced in addition to s_dirty. For
example, we can designate a queue s_dirty_atime for inodes dirtied
only by `atime', and sync them lazily.
> inode->dirtied_when
> - inode->dirtied_when is updated to the *current* jiffies on pushing into
> s_dirty, and is never changed in other cases.
> - time-ordering thus can be simply ensured while moving inodes between lists,
> since (time order == enqueue order)
>
> Cc: Ken Chen <kenchen@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> ---
> fs/fs-writeback.c | 106 +++++++++++++++++++++-----------------------
> 1 file changed, 52 insertions(+), 54 deletions(-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] writeback: 3-queue based writeback schedule
[not found] ` <20070810170534.GA5137@mail.ustc.edu.cn>
@ 2007-08-10 17:05 ` Fengguang Wu
0 siblings, 0 replies; 13+ messages in thread
From: Fengguang Wu @ 2007-08-10 17:05 UTC (permalink / raw)
To: Andrew Morton, Ken Chen, Andrew Morton, Miklos Szeredi,
linux-kernel, linux-fsdeve
On Sat, Aug 11, 2007 at 12:47:15AM +0800, Fengguang Wu wrote:
> In fact s_more_io is no longer necessary. We end up with a priority
Ah sorry, s_more_io is still needed to keep the time-ordering. I was
thinking of schedule fairness, in which sense only one cyclic list
will be sufficient.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-08-10 17:05 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070810063412.238042387@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810063419.454829766@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810063419.549052142@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 2/4] writeback: 3-queue based writeback schedule Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810164715.GA5508@mail.ustc.edu.cn>
2007-08-10 16:47 ` Fengguang Wu
2007-08-10 16:47 ` Fengguang Wu
[not found] ` <20070810170534.GA5137@mail.ustc.edu.cn>
2007-08-10 17:05 ` Fengguang Wu
[not found] ` <20070810063419.657077419@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 3/4] writeback: function renames and cleanups Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810063419.786586676@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).