* [PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues [not found] <20070810063412.238042387@mail.ustc.edu.cn> @ 2007-08-10 6:34 ` Fengguang Wu 2007-08-10 6:34 ` Fengguang Wu ` (4 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton; +Cc: Ken Chen, Miklos Szeredi, linux-kernel, linux-fsdevel Andrew, I'd like to propose a cleaner way of using the s_dirty, s_io, s_more_io queues for the writeback of dirty inodes. The basic idea is to clearly define the function of the queues, especially to decouple s_diry from s_io/s_more_io. The details are in the changelog of patch 2. The patches are some cleanups on top of Andrew's s_dirty time-ordering patches and Ken's s_more_io patch: [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io [PATCH 2/4] writeback: 3-queue based writeback schedule [PATCH 3/4] writeback: function renames and cleanups [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() fs/fs-writeback.c | 196 +++++++++++++++++++++++-------------------- fs/ntfs/super.c | 4 include/linux/fs.h | 1 3 files changed, 108 insertions(+), 93 deletions(-) Note that the patches need rework for inserting into the right place of your queue of -mm patches. I'll take care of it in the next take. Thank you, Fengguang -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues [not found] <20070810063412.238042387@mail.ustc.edu.cn> 2007-08-10 6:34 ` [PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues Fengguang Wu @ 2007-08-10 6:34 ` Fengguang Wu [not found] ` <20070810063419.454829766@mail.ustc.edu.cn> ` (3 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton; +Cc: Ken Chen, Miklos Szeredi, linux-kernel, linux-fsdevel Andrew, I'd like to propose a cleaner way of using the s_dirty, s_io, s_more_io queues for the writeback of dirty inodes. The basic idea is to clearly define the function of the queues, especially to decouple s_diry from s_io/s_more_io. The details are in the changelog of patch 2. The patches are some cleanups on top of Andrew's s_dirty time-ordering patches and Ken's s_more_io patch: [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io [PATCH 2/4] writeback: 3-queue based writeback schedule [PATCH 3/4] writeback: function renames and cleanups [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() fs/fs-writeback.c | 196 +++++++++++++++++++++++-------------------- fs/ntfs/super.c | 4 include/linux/fs.h | 1 3 files changed, 108 insertions(+), 93 deletions(-) Note that the patches need rework for inserting into the right place of your queue of -mm patches. I'll take care of it in the next take. Thank you, Fengguang -- ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20070810063419.454829766@mail.ustc.edu.cn>]
* [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io [not found] ` <20070810063419.454829766@mail.ustc.edu.cn> @ 2007-08-10 6:34 ` Fengguang Wu 2007-08-10 6:34 ` Fengguang Wu 1 sibling, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel [-- Attachment #1: io-dirty-check.patch --] [-- Type: text/plain, Size: 3964 bytes --] It helps catch bugs like this: [ 738.645689] fs/fs-writeback.c:535: s_dirty got screwed up [ 738.646114] ffff8100028532b0:4295082249 [ 738.646255] ffff810002856858:4295082259 [ 738.646388] ffff810002831b58:4295082667 [ 738.646520] ffff81000281b1b0:4295082671 [ 738.646651] ffff81000281d798:4295083507 ========================================== s_dirty/s_io [ 738.646783] ffff81000287e998:4295081393 [ 738.646916] ffff81000287e430:4295081403 [ 738.647068] ffff8100028789d8:4295081409 [ 738.647212] ffff810002878470:4295081415 [ 738.647358] ffff810002884a18:4295081421 [ 738.647503] ffff8100028844b0:4295081427 [ 738.647648] ffff810002890a58:4295081433 [ 738.647782] ffff8100028904f0:4295081441 [ 738.647894] ffff81000288da98:4295081449 [ 738.648011] ffff81000288d530:4295081455 [ 738.648123] ffff810002897ad8:4295081461 [ 738.648235] ffff810002897570:4295081469 [ 738.648347] ffff810002894b18:4295081477 [ 738.648459] ffff8100028945b0:4295081483 The buggy line 534 is list_splice_init(&sb->s_io, sb->s_dirty.prev); This is not the only time-ordering bug in linux-2.6.23-rc1-mm2. Let's fix them all. Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -26,12 +26,12 @@ int sysctl_inode_debug __read_mostly; -static int __check(struct super_block *sb, int print_stuff) +static int __check(struct list_head *head, int print_stuff) { - struct list_head *cursor = &sb->s_dirty; + struct list_head *cursor = head; unsigned long dirtied_when = 0; - while ((cursor = cursor->prev) != &sb->s_dirty) { + while ((cursor = cursor->prev) != head) { struct inode *inode = list_entry(cursor, struct inode, i_list); if (print_stuff) { printk("%p:%lu\n", inode, inode->dirtied_when); @@ -51,14 +51,32 @@ static void __check_dirty_inode_list(str if (!sysctl_inode_debug) return; - if (__check(sb, 0)) { + if (__check(&sb->s_dirty, 0)) { sysctl_inode_debug = 0; if (inode) printk("%s:%d: s_dirty got screwed up. inode=%p:%lu\n", file, line, inode, inode->dirtied_when); else printk("%s:%d: s_dirty got screwed up\n", file, line); - __check(sb, 1); + __check(&sb->s_dirty, 1); + } + if (__check(&sb->s_io, 0)) { + sysctl_inode_debug = 0; + if (inode) + printk("%s:%d: s_io got screwed up. inode=%p:%lu\n", + file, line, inode, inode->dirtied_when); + else + printk("%s:%d: s_io got screwed up\n", file, line); + __check(&sb->s_io, 1); + } + if (__check(&sb->s_more_io, 0)) { + sysctl_inode_debug = 0; + if (inode) + printk("%s:%d: s_more_io got screwed up. inode=%p:%lu\n", + file, line, inode, inode->dirtied_when); + else + printk("%s:%d: s_more_io got screwed up\n", file, line); + __check(&sb->s_more_io, 1); } } @@ -223,7 +241,9 @@ static void redirty_tail(struct inode *i */ static void requeue_io(struct inode *inode) { + check_dirty_inode(inode); list_move(&inode->i_list, &inode->i_sb->s_more_io); + check_dirty_inode(inode); } static void inode_sync_complete(struct inode *inode) @@ -483,7 +503,9 @@ int generic_sync_sb_inodes(struct super_ /* Was this inode dirtied too recently? */ if (wbc->older_than_this && time_after(inode->dirtied_when, *wbc->older_than_this)) { + check_dirty_inode_list(sb); list_splice_init(&sb->s_io, sb->s_dirty.prev); + check_dirty_inode_list(sb); break; } @@ -520,8 +542,11 @@ int generic_sync_sb_inodes(struct super_ break; } - if (list_empty(&sb->s_io)) + if (list_empty(&sb->s_io)) { + check_dirty_inode_list(sb); list_splice_init(&sb->s_more_io, &sb->s_io); + check_dirty_inode_list(sb); + } spin_unlock(&inode_lock); return ret; /* Leave any unwritten inodes on s_io */ } -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io [not found] ` <20070810063419.454829766@mail.ustc.edu.cn> 2007-08-10 6:34 ` [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io Fengguang Wu @ 2007-08-10 6:34 ` Fengguang Wu 1 sibling, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel [-- Attachment #1: io-dirty-check.patch --] [-- Type: text/plain, Size: 3964 bytes --] It helps catch bugs like this: [ 738.645689] fs/fs-writeback.c:535: s_dirty got screwed up [ 738.646114] ffff8100028532b0:4295082249 [ 738.646255] ffff810002856858:4295082259 [ 738.646388] ffff810002831b58:4295082667 [ 738.646520] ffff81000281b1b0:4295082671 [ 738.646651] ffff81000281d798:4295083507 ========================================== s_dirty/s_io [ 738.646783] ffff81000287e998:4295081393 [ 738.646916] ffff81000287e430:4295081403 [ 738.647068] ffff8100028789d8:4295081409 [ 738.647212] ffff810002878470:4295081415 [ 738.647358] ffff810002884a18:4295081421 [ 738.647503] ffff8100028844b0:4295081427 [ 738.647648] ffff810002890a58:4295081433 [ 738.647782] ffff8100028904f0:4295081441 [ 738.647894] ffff81000288da98:4295081449 [ 738.648011] ffff81000288d530:4295081455 [ 738.648123] ffff810002897ad8:4295081461 [ 738.648235] ffff810002897570:4295081469 [ 738.648347] ffff810002894b18:4295081477 [ 738.648459] ffff8100028945b0:4295081483 The buggy line 534 is list_splice_init(&sb->s_io, sb->s_dirty.prev); This is not the only time-ordering bug in linux-2.6.23-rc1-mm2. Let's fix them all. Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 37 +++++++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -26,12 +26,12 @@ int sysctl_inode_debug __read_mostly; -static int __check(struct super_block *sb, int print_stuff) +static int __check(struct list_head *head, int print_stuff) { - struct list_head *cursor = &sb->s_dirty; + struct list_head *cursor = head; unsigned long dirtied_when = 0; - while ((cursor = cursor->prev) != &sb->s_dirty) { + while ((cursor = cursor->prev) != head) { struct inode *inode = list_entry(cursor, struct inode, i_list); if (print_stuff) { printk("%p:%lu\n", inode, inode->dirtied_when); @@ -51,14 +51,32 @@ static void __check_dirty_inode_list(str if (!sysctl_inode_debug) return; - if (__check(sb, 0)) { + if (__check(&sb->s_dirty, 0)) { sysctl_inode_debug = 0; if (inode) printk("%s:%d: s_dirty got screwed up. inode=%p:%lu\n", file, line, inode, inode->dirtied_when); else printk("%s:%d: s_dirty got screwed up\n", file, line); - __check(sb, 1); + __check(&sb->s_dirty, 1); + } + if (__check(&sb->s_io, 0)) { + sysctl_inode_debug = 0; + if (inode) + printk("%s:%d: s_io got screwed up. inode=%p:%lu\n", + file, line, inode, inode->dirtied_when); + else + printk("%s:%d: s_io got screwed up\n", file, line); + __check(&sb->s_io, 1); + } + if (__check(&sb->s_more_io, 0)) { + sysctl_inode_debug = 0; + if (inode) + printk("%s:%d: s_more_io got screwed up. inode=%p:%lu\n", + file, line, inode, inode->dirtied_when); + else + printk("%s:%d: s_more_io got screwed up\n", file, line); + __check(&sb->s_more_io, 1); } } @@ -223,7 +241,9 @@ static void redirty_tail(struct inode *i */ static void requeue_io(struct inode *inode) { + check_dirty_inode(inode); list_move(&inode->i_list, &inode->i_sb->s_more_io); + check_dirty_inode(inode); } static void inode_sync_complete(struct inode *inode) @@ -483,7 +503,9 @@ int generic_sync_sb_inodes(struct super_ /* Was this inode dirtied too recently? */ if (wbc->older_than_this && time_after(inode->dirtied_when, *wbc->older_than_this)) { + check_dirty_inode_list(sb); list_splice_init(&sb->s_io, sb->s_dirty.prev); + check_dirty_inode_list(sb); break; } @@ -520,8 +542,11 @@ int generic_sync_sb_inodes(struct super_ break; } - if (list_empty(&sb->s_io)) + if (list_empty(&sb->s_io)) { + check_dirty_inode_list(sb); list_splice_init(&sb->s_more_io, &sb->s_io); + check_dirty_inode_list(sb); + } spin_unlock(&inode_lock); return ret; /* Leave any unwritten inodes on s_io */ } -- ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20070810063419.549052142@mail.ustc.edu.cn>]
* [PATCH 2/4] writeback: 3-queue based writeback schedule [not found] ` <20070810063419.549052142@mail.ustc.edu.cn> @ 2007-08-10 6:34 ` Fengguang Wu 2007-08-10 6:34 ` Fengguang Wu [not found] ` <20070810164715.GA5508@mail.ustc.edu.cn> 2 siblings, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel [-- Attachment #1: io-dirty-order-fix.patch --] [-- Type: text/plain, Size: 9153 bytes --] Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that - time-ordering of dirtied_when can be easily maintained - writeback can continue from where previous run left out The majority work has been done by Andrew Morton and Ken Chen, this patch just clarifies the roles of the 3 queues: - s_dirty for io delay(up to dirty_expire_interval) - s_io for io run(a full scan of s_io may involve multiple runs) - s_more_io for io continuation The following paradigm shows the data flow. requeue on new scan(empty s_io) +-----------------------------+ | | dirty old | | inodes enough V | ======> s_dirty ======> s_io | ^ | requeue io | | +---------------------> s_more_io | hold back | +----------------+----------> disk write requests sb->s_dirty: a FIFO queue - s_dirty hosts not-yet-expired(recently dirtied) dirty inodes - once expired, inodes will be moved out of s_dirty and *never put back* (unless for some reason we have to hold on the inode for some time) sb->s_io and sb->s_more_io: a cyclic queue scanned for io - on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be appended to s_io - on each full scan of s_io, all s_more_io inodes will be moved back to s_io - large files that cannot be synced in one run will be moved to s_more_io for retry on next full scan inode->dirtied_when - inode->dirtied_when is updated to the *current* jiffies on pushing into s_dirty, and is never changed in other cases. - time-ordering thus can be simply ensured while moving inodes between lists, since (time order == enqueue order) Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 106 +++++++++++++++++++++----------------------- 1 file changed, 52 insertions(+), 54 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -93,6 +93,15 @@ static void __check_dirty_inode_list(str __FILE__, __LINE__); \ } while (0) + +int sb_has_dirty_inodes(struct super_block *sb) +{ + return !list_empty(&sb->s_dirty) || + !list_empty(&sb->s_io) || + !list_empty(&sb->s_more_io); +} +EXPORT_SYMBOL(sb_has_dirty_inodes); + /** * __mark_inode_dirty - internal function * @inode: inode to mark @@ -187,7 +196,7 @@ void __mark_inode_dirty(struct inode *in goto out; /* - * If the inode was already on s_dirty or s_io, don't + * If the inode was already on s_dirty/s_io/s_more_io, don't * reposition it (that would break s_dirty time-ordering). */ if (!was_dirty) { @@ -211,33 +220,20 @@ static int write_inode(struct inode *ino } /* - * Redirty an inode: set its when-it-was dirtied timestamp and move it to the - * furthest end of its superblock's dirty-inode list. - * - * Before stamping the inode's ->dirtied_when, we check to see whether it is - * already the most-recently-dirtied inode on the s_dirty list. If that is - * the case then the inode must have been redirtied while it was being written - * out and we don't reset its dirtied_when. + * Enqueue a newly dirtied inode: + * - set its when-it-was dirtied timestamp + * - move it to the furthest end of its superblock's dirty-inode list */ static void redirty_tail(struct inode *inode) { - struct super_block *sb = inode->i_sb; - check_dirty_inode(inode); - if (!list_empty(&sb->s_dirty)) { - struct inode *tail_inode; - - tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list); - if (!time_after_eq(inode->dirtied_when, - tail_inode->dirtied_when)) - inode->dirtied_when = jiffies; - } - list_move(&inode->i_list, &sb->s_dirty); + inode->dirtied_when = jiffies; + list_move(&inode->i_list, &inode->i_sb->s_dirty); check_dirty_inode(inode); } /* - * requeue inode for re-scanning after sb->s_io list is exhausted. + * Queue an inode for more io in the next full scan of s_io. */ static void requeue_io(struct inode *inode) { @@ -246,6 +242,32 @@ static void requeue_io(struct inode *ino check_dirty_inode(inode); } +/* + * Queue all possible inodes for a run of io. + * The resulting s_io is in order of: + * - inodes queued for more io from s_more_io(once for a full scan of s_io) + * - possible remaining inodes in s_io(was a partial scan) + * - dirty inodes (old enough) from s_dirty + */ +static void queue_inodes_for_io(struct super_block *sb, + unsigned long *older_than_this) +{ + check_dirty_inode_list(sb); + if (list_empty(&sb->s_io)) + list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */ + check_dirty_inode_list(sb); + while (!list_empty(&sb->s_dirty)) { + struct inode *inode = list_entry(sb->s_dirty.prev, + struct inode, i_list); + /* Was this inode dirtied too recently? */ + if (older_than_this && + time_after(inode->dirtied_when, *older_than_this)) + break; + list_move(&inode->i_list, &sb->s_io); + } + check_dirty_inode_list(sb); +} + static void inode_sync_complete(struct inode *inode) { /* @@ -305,7 +327,7 @@ __sync_single_inode(struct inode *inode, /* * We didn't write back all the pages. nfs_writepages() * sometimes bales out without doing anything. Redirty - * the inode. It is moved from s_io onto s_dirty. + * the inode; Move it from s_io onto s_more_io/s_dirty. */ /* * akpm: if the caller was the kupdate function we put @@ -318,10 +340,9 @@ __sync_single_inode(struct inode *inode, */ if (wbc->for_kupdate) { /* - * For the kupdate function we leave the inode - * at the head of sb_dirty so it will get more - * writeout as soon as the queue becomes - * uncongested. + * For the kupdate function we move the inode + * to s_more_io so it will get more writeout as + * soon as the queue becomes uncongested. */ inode->i_state |= I_DIRTY_PAGES; requeue_io(inode); @@ -380,9 +401,9 @@ __writeback_single_inode(struct inode *i /* * We're skipping this inode because it's locked, and we're not * doing writeback-for-data-integrity. Move it to the head of - * s_dirty so that writeback can proceed with the other inodes + * s_more_io so that writeback can proceed with the other inodes * on s_io. We'll have another go at writing back this inode - * when the s_dirty iodes get moved back onto s_io. + * when we completed a full scan of s_io. */ requeue_io(inode); @@ -444,16 +465,12 @@ __writeback_single_inode(struct inode *i int generic_sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc) { - const unsigned long start = jiffies; /* livelock avoidance */ int ret = 0; spin_lock(&inode_lock); - if (!wbc->for_kupdate || list_empty(&sb->s_io)) { - check_dirty_inode_list(sb); - list_splice_init(&sb->s_dirty, &sb->s_io); - check_dirty_inode_list(sb); - } + if (!wbc->for_kupdate || list_empty(&sb->s_io)) + queue_inodes_for_io(sb, wbc->older_than_this); while (!list_empty(&sb->s_io)) { int err; @@ -496,19 +513,6 @@ int generic_sync_sb_inodes(struct super_ continue; /* blockdev has wrong queue */ } - /* Was this inode dirtied after sync_sb_inodes was called? */ - if (time_after(inode->dirtied_when, start)) - break; - - /* Was this inode dirtied too recently? */ - if (wbc->older_than_this && time_after(inode->dirtied_when, - *wbc->older_than_this)) { - check_dirty_inode_list(sb); - list_splice_init(&sb->s_io, sb->s_dirty.prev); - check_dirty_inode_list(sb); - break; - } - /* Is another pdflush already flushing this queue? */ if (current_is_pdflush() && !writeback_acquire(bdi)) break; @@ -541,12 +545,6 @@ int generic_sync_sb_inodes(struct super_ if (wbc->nr_to_write <= 0) break; } - - if (list_empty(&sb->s_io)) { - check_dirty_inode_list(sb); - list_splice_init(&sb->s_more_io, &sb->s_io); - check_dirty_inode_list(sb); - } spin_unlock(&inode_lock); return ret; /* Leave any unwritten inodes on s_io */ } @@ -566,7 +564,7 @@ static int sync_sb_inodes(struct super_b * Note: * We don't need to grab a reference to superblock here. If it has non-empty * ->s_dirty it's hadn't been killed yet and kill_super() won't proceed - * past sync_inodes_sb() until both the ->s_dirty and ->s_io lists are + * past sync_inodes_sb() until the ->s_dirty/s_io/s_more_io lists are all * empty. Since __sync_single_inode() regains inode_lock before it finally moves * inode from superblock lists we are OK. * @@ -589,7 +587,7 @@ int writeback_inodes(struct writeback_co restart: sb = sb_entry(super_blocks.prev); for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.prev)) { - if (!list_empty(&sb->s_dirty) || !list_empty(&sb->s_io)) { + if (sb_has_dirty_inodes(sb)) { /* we're making our own get_super here */ sb->s_count++; spin_unlock(&sb_lock); -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/4] writeback: 3-queue based writeback schedule [not found] ` <20070810063419.549052142@mail.ustc.edu.cn> 2007-08-10 6:34 ` [PATCH 2/4] writeback: 3-queue based writeback schedule Fengguang Wu @ 2007-08-10 6:34 ` Fengguang Wu [not found] ` <20070810164715.GA5508@mail.ustc.edu.cn> 2 siblings, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel [-- Attachment #1: io-dirty-order-fix.patch --] [-- Type: text/plain, Size: 9153 bytes --] Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that - time-ordering of dirtied_when can be easily maintained - writeback can continue from where previous run left out The majority work has been done by Andrew Morton and Ken Chen, this patch just clarifies the roles of the 3 queues: - s_dirty for io delay(up to dirty_expire_interval) - s_io for io run(a full scan of s_io may involve multiple runs) - s_more_io for io continuation The following paradigm shows the data flow. requeue on new scan(empty s_io) +-----------------------------+ | | dirty old | | inodes enough V | ======> s_dirty ======> s_io | ^ | requeue io | | +---------------------> s_more_io | hold back | +----------------+----------> disk write requests sb->s_dirty: a FIFO queue - s_dirty hosts not-yet-expired(recently dirtied) dirty inodes - once expired, inodes will be moved out of s_dirty and *never put back* (unless for some reason we have to hold on the inode for some time) sb->s_io and sb->s_more_io: a cyclic queue scanned for io - on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be appended to s_io - on each full scan of s_io, all s_more_io inodes will be moved back to s_io - large files that cannot be synced in one run will be moved to s_more_io for retry on next full scan inode->dirtied_when - inode->dirtied_when is updated to the *current* jiffies on pushing into s_dirty, and is never changed in other cases. - time-ordering thus can be simply ensured while moving inodes between lists, since (time order == enqueue order) Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 106 +++++++++++++++++++++----------------------- 1 file changed, 52 insertions(+), 54 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -93,6 +93,15 @@ static void __check_dirty_inode_list(str __FILE__, __LINE__); \ } while (0) + +int sb_has_dirty_inodes(struct super_block *sb) +{ + return !list_empty(&sb->s_dirty) || + !list_empty(&sb->s_io) || + !list_empty(&sb->s_more_io); +} +EXPORT_SYMBOL(sb_has_dirty_inodes); + /** * __mark_inode_dirty - internal function * @inode: inode to mark @@ -187,7 +196,7 @@ void __mark_inode_dirty(struct inode *in goto out; /* - * If the inode was already on s_dirty or s_io, don't + * If the inode was already on s_dirty/s_io/s_more_io, don't * reposition it (that would break s_dirty time-ordering). */ if (!was_dirty) { @@ -211,33 +220,20 @@ static int write_inode(struct inode *ino } /* - * Redirty an inode: set its when-it-was dirtied timestamp and move it to the - * furthest end of its superblock's dirty-inode list. - * - * Before stamping the inode's ->dirtied_when, we check to see whether it is - * already the most-recently-dirtied inode on the s_dirty list. If that is - * the case then the inode must have been redirtied while it was being written - * out and we don't reset its dirtied_when. + * Enqueue a newly dirtied inode: + * - set its when-it-was dirtied timestamp + * - move it to the furthest end of its superblock's dirty-inode list */ static void redirty_tail(struct inode *inode) { - struct super_block *sb = inode->i_sb; - check_dirty_inode(inode); - if (!list_empty(&sb->s_dirty)) { - struct inode *tail_inode; - - tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list); - if (!time_after_eq(inode->dirtied_when, - tail_inode->dirtied_when)) - inode->dirtied_when = jiffies; - } - list_move(&inode->i_list, &sb->s_dirty); + inode->dirtied_when = jiffies; + list_move(&inode->i_list, &inode->i_sb->s_dirty); check_dirty_inode(inode); } /* - * requeue inode for re-scanning after sb->s_io list is exhausted. + * Queue an inode for more io in the next full scan of s_io. */ static void requeue_io(struct inode *inode) { @@ -246,6 +242,32 @@ static void requeue_io(struct inode *ino check_dirty_inode(inode); } +/* + * Queue all possible inodes for a run of io. + * The resulting s_io is in order of: + * - inodes queued for more io from s_more_io(once for a full scan of s_io) + * - possible remaining inodes in s_io(was a partial scan) + * - dirty inodes (old enough) from s_dirty + */ +static void queue_inodes_for_io(struct super_block *sb, + unsigned long *older_than_this) +{ + check_dirty_inode_list(sb); + if (list_empty(&sb->s_io)) + list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */ + check_dirty_inode_list(sb); + while (!list_empty(&sb->s_dirty)) { + struct inode *inode = list_entry(sb->s_dirty.prev, + struct inode, i_list); + /* Was this inode dirtied too recently? */ + if (older_than_this && + time_after(inode->dirtied_when, *older_than_this)) + break; + list_move(&inode->i_list, &sb->s_io); + } + check_dirty_inode_list(sb); +} + static void inode_sync_complete(struct inode *inode) { /* @@ -305,7 +327,7 @@ __sync_single_inode(struct inode *inode, /* * We didn't write back all the pages. nfs_writepages() * sometimes bales out without doing anything. Redirty - * the inode. It is moved from s_io onto s_dirty. + * the inode; Move it from s_io onto s_more_io/s_dirty. */ /* * akpm: if the caller was the kupdate function we put @@ -318,10 +340,9 @@ __sync_single_inode(struct inode *inode, */ if (wbc->for_kupdate) { /* - * For the kupdate function we leave the inode - * at the head of sb_dirty so it will get more - * writeout as soon as the queue becomes - * uncongested. + * For the kupdate function we move the inode + * to s_more_io so it will get more writeout as + * soon as the queue becomes uncongested. */ inode->i_state |= I_DIRTY_PAGES; requeue_io(inode); @@ -380,9 +401,9 @@ __writeback_single_inode(struct inode *i /* * We're skipping this inode because it's locked, and we're not * doing writeback-for-data-integrity. Move it to the head of - * s_dirty so that writeback can proceed with the other inodes + * s_more_io so that writeback can proceed with the other inodes * on s_io. We'll have another go at writing back this inode - * when the s_dirty iodes get moved back onto s_io. + * when we completed a full scan of s_io. */ requeue_io(inode); @@ -444,16 +465,12 @@ __writeback_single_inode(struct inode *i int generic_sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc) { - const unsigned long start = jiffies; /* livelock avoidance */ int ret = 0; spin_lock(&inode_lock); - if (!wbc->for_kupdate || list_empty(&sb->s_io)) { - check_dirty_inode_list(sb); - list_splice_init(&sb->s_dirty, &sb->s_io); - check_dirty_inode_list(sb); - } + if (!wbc->for_kupdate || list_empty(&sb->s_io)) + queue_inodes_for_io(sb, wbc->older_than_this); while (!list_empty(&sb->s_io)) { int err; @@ -496,19 +513,6 @@ int generic_sync_sb_inodes(struct super_ continue; /* blockdev has wrong queue */ } - /* Was this inode dirtied after sync_sb_inodes was called? */ - if (time_after(inode->dirtied_when, start)) - break; - - /* Was this inode dirtied too recently? */ - if (wbc->older_than_this && time_after(inode->dirtied_when, - *wbc->older_than_this)) { - check_dirty_inode_list(sb); - list_splice_init(&sb->s_io, sb->s_dirty.prev); - check_dirty_inode_list(sb); - break; - } - /* Is another pdflush already flushing this queue? */ if (current_is_pdflush() && !writeback_acquire(bdi)) break; @@ -541,12 +545,6 @@ int generic_sync_sb_inodes(struct super_ if (wbc->nr_to_write <= 0) break; } - - if (list_empty(&sb->s_io)) { - check_dirty_inode_list(sb); - list_splice_init(&sb->s_more_io, &sb->s_io); - check_dirty_inode_list(sb); - } spin_unlock(&inode_lock); return ret; /* Leave any unwritten inodes on s_io */ } @@ -566,7 +564,7 @@ static int sync_sb_inodes(struct super_b * Note: * We don't need to grab a reference to superblock here. If it has non-empty * ->s_dirty it's hadn't been killed yet and kill_super() won't proceed - * past sync_inodes_sb() until both the ->s_dirty and ->s_io lists are + * past sync_inodes_sb() until the ->s_dirty/s_io/s_more_io lists are all * empty. Since __sync_single_inode() regains inode_lock before it finally moves * inode from superblock lists we are OK. * @@ -589,7 +587,7 @@ int writeback_inodes(struct writeback_co restart: sb = sb_entry(super_blocks.prev); for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.prev)) { - if (!list_empty(&sb->s_dirty) || !list_empty(&sb->s_io)) { + if (sb_has_dirty_inodes(sb)) { /* we're making our own get_super here */ sb->s_count++; spin_unlock(&sb_lock); -- ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20070810164715.GA5508@mail.ustc.edu.cn>]
* Re: [PATCH 2/4] writeback: 3-queue based writeback schedule [not found] ` <20070810164715.GA5508@mail.ustc.edu.cn> @ 2007-08-10 16:47 ` Fengguang Wu 2007-08-10 16:47 ` Fengguang Wu [not found] ` <20070810170534.GA5137@mail.ustc.edu.cn> 2 siblings, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 16:47 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel On Fri, Aug 10, 2007 at 02:34:14PM +0800, Fengguang Wu wrote: > Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that > - time-ordering of dirtied_when can be easily maintained > - writeback can continue from where previous run left out > > The majority work has been done by Andrew Morton and Ken Chen, > this patch just clarifies the roles of the 3 queues: > - s_dirty for io delay(up to dirty_expire_interval) > - s_io for io run(a full scan of s_io may involve multiple runs) > - s_more_io for io continuation > > The following paradigm shows the data flow. > > requeue on new scan(empty s_io) > +-----------------------------+ > | | > dirty old | | > inodes enough V | > ======> s_dirty ======> s_io | > ^ | requeue io | > | +---------------------> s_more_io > | hold back | > +----------------+----------> disk write requests > > sb->s_dirty: a FIFO queue > - s_dirty hosts not-yet-expired(recently dirtied) dirty inodes > - once expired, inodes will be moved out of s_dirty and *never put back* > (unless for some reason we have to hold on the inode for some time) > > sb->s_io and sb->s_more_io: a cyclic queue scanned for io > - on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be > appended to s_io > - on each full scan of s_io, all s_more_io inodes will be moved back to s_io > - large files that cannot be synced in one run will be moved to s_more_io for > retry on next full scan In fact s_more_io is no longer necessary. We end up with a priority io-delaying queue s_dirty and a cyclic io-syncing queue s_io. They are properly decoupled. More flexible data structure can be used for s_dirty, if we want to redirty an inode with arbitrary delays. Also more priority queues can be introduced in addition to s_dirty. For example, we can designate a queue s_dirty_atime for inodes dirtied only by `atime', and sync them lazily. > inode->dirtied_when > - inode->dirtied_when is updated to the *current* jiffies on pushing into > s_dirty, and is never changed in other cases. > - time-ordering thus can be simply ensured while moving inodes between lists, > since (time order == enqueue order) > > Cc: Ken Chen <kenchen@google.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > --- > fs/fs-writeback.c | 106 +++++++++++++++++++++----------------------- > 1 file changed, 52 insertions(+), 54 deletions(-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] writeback: 3-queue based writeback schedule [not found] ` <20070810164715.GA5508@mail.ustc.edu.cn> 2007-08-10 16:47 ` Fengguang Wu @ 2007-08-10 16:47 ` Fengguang Wu [not found] ` <20070810170534.GA5137@mail.ustc.edu.cn> 2 siblings, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 16:47 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel On Fri, Aug 10, 2007 at 02:34:14PM +0800, Fengguang Wu wrote: > Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that > - time-ordering of dirtied_when can be easily maintained > - writeback can continue from where previous run left out > > The majority work has been done by Andrew Morton and Ken Chen, > this patch just clarifies the roles of the 3 queues: > - s_dirty for io delay(up to dirty_expire_interval) > - s_io for io run(a full scan of s_io may involve multiple runs) > - s_more_io for io continuation > > The following paradigm shows the data flow. > > requeue on new scan(empty s_io) > +-----------------------------+ > | | > dirty old | | > inodes enough V | > ======> s_dirty ======> s_io | > ^ | requeue io | > | +---------------------> s_more_io > | hold back | > +----------------+----------> disk write requests > > sb->s_dirty: a FIFO queue > - s_dirty hosts not-yet-expired(recently dirtied) dirty inodes > - once expired, inodes will be moved out of s_dirty and *never put back* > (unless for some reason we have to hold on the inode for some time) > > sb->s_io and sb->s_more_io: a cyclic queue scanned for io > - on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be > appended to s_io > - on each full scan of s_io, all s_more_io inodes will be moved back to s_io > - large files that cannot be synced in one run will be moved to s_more_io for > retry on next full scan In fact s_more_io is no longer necessary. We end up with a priority io-delaying queue s_dirty and a cyclic io-syncing queue s_io. They are properly decoupled. More flexible data structure can be used for s_dirty, if we want to redirty an inode with arbitrary delays. Also more priority queues can be introduced in addition to s_dirty. For example, we can designate a queue s_dirty_atime for inodes dirtied only by `atime', and sync them lazily. > inode->dirtied_when > - inode->dirtied_when is updated to the *current* jiffies on pushing into > s_dirty, and is never changed in other cases. > - time-ordering thus can be simply ensured while moving inodes between lists, > since (time order == enqueue order) > > Cc: Ken Chen <kenchen@google.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> > --- > fs/fs-writeback.c | 106 +++++++++++++++++++++----------------------- > 1 file changed, 52 insertions(+), 54 deletions(-) ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20070810170534.GA5137@mail.ustc.edu.cn>]
* Re: [PATCH 2/4] writeback: 3-queue based writeback schedule [not found] ` <20070810170534.GA5137@mail.ustc.edu.cn> @ 2007-08-10 17:05 ` Fengguang Wu 0 siblings, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 17:05 UTC (permalink / raw) To: Andrew Morton, Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdeve On Sat, Aug 11, 2007 at 12:47:15AM +0800, Fengguang Wu wrote: > In fact s_more_io is no longer necessary. We end up with a priority Ah sorry, s_more_io is still needed to keep the time-ordering. I was thinking of schedule fairness, in which sense only one cyclic list will be sufficient. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20070810063419.657077419@mail.ustc.edu.cn>]
* [PATCH 3/4] writeback: function renames and cleanups [not found] ` <20070810063419.657077419@mail.ustc.edu.cn> @ 2007-08-10 6:34 ` Fengguang Wu 2007-08-10 6:34 ` Fengguang Wu 1 sibling, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel [-- Attachment #1: fs-writeback-cleanup.patch --] [-- Type: text/plain, Size: 7280 bytes --] Two function renames: - rename redirty_tail() to queue_dirty_inode() - rename requeue_io() to queue_for_more_io() Also some code cleanups on fs-writeback.c. No behavior changes. Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 133 ++++++++++++++++++++------------------------ 1 file changed, 62 insertions(+), 71 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -102,6 +102,55 @@ int sb_has_dirty_inodes(struct super_blo } EXPORT_SYMBOL(sb_has_dirty_inodes); +/* + * Enqueue a newly dirtied inode: + * - set its when-it-was dirtied timestamp + * - move it to the furthest end of its superblock's dirty-inode list + */ +static void queue_dirty_inode(struct inode *inode) +{ + check_dirty_inode(inode); + inode->dirtied_when = jiffies; + list_move(&inode->i_list, &inode->i_sb->s_dirty); + check_dirty_inode(inode); +} + +/* + * Queue an inode for more io in the next full scan of s_io. + */ +static void queue_for_more_io(struct inode *inode) +{ + check_dirty_inode(inode); + list_move(&inode->i_list, &inode->i_sb->s_more_io); + check_dirty_inode(inode); +} + +/* + * Queue all possible inodes for a run of io. + * The resulting s_io is in order of: + * - inodes queued for more io from s_more_io(once for a full scan of s_io) + * - possible remaining inodes in s_io(was a partial scan) + * - dirty inodes (old enough) from s_dirty + */ +static void queue_inodes_for_io(struct super_block *sb, + unsigned long *older_than_this) +{ + check_dirty_inode_list(sb); + if (list_empty(&sb->s_io)) + list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */ + check_dirty_inode_list(sb); + while (!list_empty(&sb->s_dirty)) { + struct inode *inode = list_entry(sb->s_dirty.prev, + struct inode, i_list); + /* Was this inode dirtied too recently? */ + if (older_than_this && + time_after(inode->dirtied_when, *older_than_this)) + break; + list_move(&inode->i_list, &sb->s_io); + } + check_dirty_inode_list(sb); +} + /** * __mark_inode_dirty - internal function * @inode: inode to mark @@ -199,12 +248,8 @@ void __mark_inode_dirty(struct inode *in * If the inode was already on s_dirty/s_io/s_more_io, don't * reposition it (that would break s_dirty time-ordering). */ - if (!was_dirty) { - check_dirty_inode(inode); - inode->dirtied_when = jiffies; - list_move(&inode->i_list, &sb->s_dirty); - check_dirty_inode(inode); - } + if (!was_dirty) + queue_dirty_inode(inode); } out: spin_unlock(&inode_lock); @@ -219,55 +264,6 @@ static int write_inode(struct inode *ino return 0; } -/* - * Enqueue a newly dirtied inode: - * - set its when-it-was dirtied timestamp - * - move it to the furthest end of its superblock's dirty-inode list - */ -static void redirty_tail(struct inode *inode) -{ - check_dirty_inode(inode); - inode->dirtied_when = jiffies; - list_move(&inode->i_list, &inode->i_sb->s_dirty); - check_dirty_inode(inode); -} - -/* - * Queue an inode for more io in the next full scan of s_io. - */ -static void requeue_io(struct inode *inode) -{ - check_dirty_inode(inode); - list_move(&inode->i_list, &inode->i_sb->s_more_io); - check_dirty_inode(inode); -} - -/* - * Queue all possible inodes for a run of io. - * The resulting s_io is in order of: - * - inodes queued for more io from s_more_io(once for a full scan of s_io) - * - possible remaining inodes in s_io(was a partial scan) - * - dirty inodes (old enough) from s_dirty - */ -static void queue_inodes_for_io(struct super_block *sb, - unsigned long *older_than_this) -{ - check_dirty_inode_list(sb); - if (list_empty(&sb->s_io)) - list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */ - check_dirty_inode_list(sb); - while (!list_empty(&sb->s_dirty)) { - struct inode *inode = list_entry(sb->s_dirty.prev, - struct inode, i_list); - /* Was this inode dirtied too recently? */ - if (older_than_this && - time_after(inode->dirtied_when, *older_than_this)) - break; - list_move(&inode->i_list, &sb->s_io); - } - check_dirty_inode_list(sb); -} - static void inode_sync_complete(struct inode *inode) { /* @@ -329,6 +325,7 @@ __sync_single_inode(struct inode *inode, * sometimes bales out without doing anything. Redirty * the inode; Move it from s_io onto s_more_io/s_dirty. */ + inode->i_state |= I_DIRTY_PAGES; /* * akpm: if the caller was the kupdate function we put * this inode at the head of s_dirty so it gets first @@ -344,8 +341,7 @@ __sync_single_inode(struct inode *inode, * to s_more_io so it will get more writeout as * soon as the queue becomes uncongested. */ - inode->i_state |= I_DIRTY_PAGES; - requeue_io(inode); + queue_for_more_io(inode); } else { /* * Otherwise fully redirty the inode so that @@ -354,15 +350,14 @@ __sync_single_inode(struct inode *inode, * file would indefinitely suspend writeout of * all the other files. */ - inode->i_state |= I_DIRTY_PAGES; - redirty_tail(inode); + queue_dirty_inode(inode); } } else if (inode->i_state & I_DIRTY) { /* * Someone redirtied the inode while were writing back * the pages. */ - redirty_tail(inode); + queue_dirty_inode(inode); } else if (atomic_read(&inode->i_count)) { /* * The inode is clean, inuse @@ -405,7 +400,7 @@ __writeback_single_inode(struct inode *i * on s_io. We'll have another go at writing back this inode * when we completed a full scan of s_io. */ - requeue_io(inode); + queue_for_more_io(inode); /* * Even if we don't actually write the inode itself here, @@ -482,7 +477,7 @@ int generic_sync_sb_inodes(struct super_ long pages_skipped; if (!bdi_cap_writeback_dirty(bdi)) { - redirty_tail(inode); + queue_dirty_inode(inode); if (sb_is_blkdev_sb(sb)) { /* * Dirty memory-backed blockdev: the ramdisk @@ -502,14 +497,14 @@ int generic_sync_sb_inodes(struct super_ wbc->encountered_congestion = 1; if (!sb_is_blkdev_sb(sb)) break; /* Skip a congested fs */ - requeue_io(inode); + queue_for_more_io(inode); continue; /* Skip a congested blockdev */ } if (wbc->bdi && bdi != wbc->bdi) { if (!sb_is_blkdev_sb(sb)) break; /* fs has the wrong queue */ - requeue_io(inode); + queue_for_more_io(inode); continue; /* blockdev has wrong queue */ } @@ -523,12 +518,8 @@ int generic_sync_sb_inodes(struct super_ err = __writeback_single_inode(inode, wbc); if (!ret) ret = err; - if (wbc->sync_mode == WB_SYNC_HOLD) { - check_dirty_inode(inode); - inode->dirtied_when = jiffies; - list_move(&inode->i_list, &sb->s_dirty); - check_dirty_inode(inode); - } + if (wbc->sync_mode == WB_SYNC_HOLD) + queue_dirty_inode(inode); if (current_is_pdflush()) writeback_release(bdi); if (wbc->pages_skipped != pages_skipped) { @@ -536,7 +527,7 @@ int generic_sync_sb_inodes(struct super_ * writeback is not making progress due to locked * buffers. Skip this inode for now. */ - redirty_tail(inode); + queue_dirty_inode(inode); } spin_unlock(&inode_lock); iput(inode); -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/4] writeback: function renames and cleanups [not found] ` <20070810063419.657077419@mail.ustc.edu.cn> 2007-08-10 6:34 ` [PATCH 3/4] writeback: function renames and cleanups Fengguang Wu @ 2007-08-10 6:34 ` Fengguang Wu 1 sibling, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel [-- Attachment #1: fs-writeback-cleanup.patch --] [-- Type: text/plain, Size: 7280 bytes --] Two function renames: - rename redirty_tail() to queue_dirty_inode() - rename requeue_io() to queue_for_more_io() Also some code cleanups on fs-writeback.c. No behavior changes. Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- fs/fs-writeback.c | 133 ++++++++++++++++++++------------------------ 1 file changed, 62 insertions(+), 71 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -102,6 +102,55 @@ int sb_has_dirty_inodes(struct super_blo } EXPORT_SYMBOL(sb_has_dirty_inodes); +/* + * Enqueue a newly dirtied inode: + * - set its when-it-was dirtied timestamp + * - move it to the furthest end of its superblock's dirty-inode list + */ +static void queue_dirty_inode(struct inode *inode) +{ + check_dirty_inode(inode); + inode->dirtied_when = jiffies; + list_move(&inode->i_list, &inode->i_sb->s_dirty); + check_dirty_inode(inode); +} + +/* + * Queue an inode for more io in the next full scan of s_io. + */ +static void queue_for_more_io(struct inode *inode) +{ + check_dirty_inode(inode); + list_move(&inode->i_list, &inode->i_sb->s_more_io); + check_dirty_inode(inode); +} + +/* + * Queue all possible inodes for a run of io. + * The resulting s_io is in order of: + * - inodes queued for more io from s_more_io(once for a full scan of s_io) + * - possible remaining inodes in s_io(was a partial scan) + * - dirty inodes (old enough) from s_dirty + */ +static void queue_inodes_for_io(struct super_block *sb, + unsigned long *older_than_this) +{ + check_dirty_inode_list(sb); + if (list_empty(&sb->s_io)) + list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */ + check_dirty_inode_list(sb); + while (!list_empty(&sb->s_dirty)) { + struct inode *inode = list_entry(sb->s_dirty.prev, + struct inode, i_list); + /* Was this inode dirtied too recently? */ + if (older_than_this && + time_after(inode->dirtied_when, *older_than_this)) + break; + list_move(&inode->i_list, &sb->s_io); + } + check_dirty_inode_list(sb); +} + /** * __mark_inode_dirty - internal function * @inode: inode to mark @@ -199,12 +248,8 @@ void __mark_inode_dirty(struct inode *in * If the inode was already on s_dirty/s_io/s_more_io, don't * reposition it (that would break s_dirty time-ordering). */ - if (!was_dirty) { - check_dirty_inode(inode); - inode->dirtied_when = jiffies; - list_move(&inode->i_list, &sb->s_dirty); - check_dirty_inode(inode); - } + if (!was_dirty) + queue_dirty_inode(inode); } out: spin_unlock(&inode_lock); @@ -219,55 +264,6 @@ static int write_inode(struct inode *ino return 0; } -/* - * Enqueue a newly dirtied inode: - * - set its when-it-was dirtied timestamp - * - move it to the furthest end of its superblock's dirty-inode list - */ -static void redirty_tail(struct inode *inode) -{ - check_dirty_inode(inode); - inode->dirtied_when = jiffies; - list_move(&inode->i_list, &inode->i_sb->s_dirty); - check_dirty_inode(inode); -} - -/* - * Queue an inode for more io in the next full scan of s_io. - */ -static void requeue_io(struct inode *inode) -{ - check_dirty_inode(inode); - list_move(&inode->i_list, &inode->i_sb->s_more_io); - check_dirty_inode(inode); -} - -/* - * Queue all possible inodes for a run of io. - * The resulting s_io is in order of: - * - inodes queued for more io from s_more_io(once for a full scan of s_io) - * - possible remaining inodes in s_io(was a partial scan) - * - dirty inodes (old enough) from s_dirty - */ -static void queue_inodes_for_io(struct super_block *sb, - unsigned long *older_than_this) -{ - check_dirty_inode_list(sb); - if (list_empty(&sb->s_io)) - list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */ - check_dirty_inode_list(sb); - while (!list_empty(&sb->s_dirty)) { - struct inode *inode = list_entry(sb->s_dirty.prev, - struct inode, i_list); - /* Was this inode dirtied too recently? */ - if (older_than_this && - time_after(inode->dirtied_when, *older_than_this)) - break; - list_move(&inode->i_list, &sb->s_io); - } - check_dirty_inode_list(sb); -} - static void inode_sync_complete(struct inode *inode) { /* @@ -329,6 +325,7 @@ __sync_single_inode(struct inode *inode, * sometimes bales out without doing anything. Redirty * the inode; Move it from s_io onto s_more_io/s_dirty. */ + inode->i_state |= I_DIRTY_PAGES; /* * akpm: if the caller was the kupdate function we put * this inode at the head of s_dirty so it gets first @@ -344,8 +341,7 @@ __sync_single_inode(struct inode *inode, * to s_more_io so it will get more writeout as * soon as the queue becomes uncongested. */ - inode->i_state |= I_DIRTY_PAGES; - requeue_io(inode); + queue_for_more_io(inode); } else { /* * Otherwise fully redirty the inode so that @@ -354,15 +350,14 @@ __sync_single_inode(struct inode *inode, * file would indefinitely suspend writeout of * all the other files. */ - inode->i_state |= I_DIRTY_PAGES; - redirty_tail(inode); + queue_dirty_inode(inode); } } else if (inode->i_state & I_DIRTY) { /* * Someone redirtied the inode while were writing back * the pages. */ - redirty_tail(inode); + queue_dirty_inode(inode); } else if (atomic_read(&inode->i_count)) { /* * The inode is clean, inuse @@ -405,7 +400,7 @@ __writeback_single_inode(struct inode *i * on s_io. We'll have another go at writing back this inode * when we completed a full scan of s_io. */ - requeue_io(inode); + queue_for_more_io(inode); /* * Even if we don't actually write the inode itself here, @@ -482,7 +477,7 @@ int generic_sync_sb_inodes(struct super_ long pages_skipped; if (!bdi_cap_writeback_dirty(bdi)) { - redirty_tail(inode); + queue_dirty_inode(inode); if (sb_is_blkdev_sb(sb)) { /* * Dirty memory-backed blockdev: the ramdisk @@ -502,14 +497,14 @@ int generic_sync_sb_inodes(struct super_ wbc->encountered_congestion = 1; if (!sb_is_blkdev_sb(sb)) break; /* Skip a congested fs */ - requeue_io(inode); + queue_for_more_io(inode); continue; /* Skip a congested blockdev */ } if (wbc->bdi && bdi != wbc->bdi) { if (!sb_is_blkdev_sb(sb)) break; /* fs has the wrong queue */ - requeue_io(inode); + queue_for_more_io(inode); continue; /* blockdev has wrong queue */ } @@ -523,12 +518,8 @@ int generic_sync_sb_inodes(struct super_ err = __writeback_single_inode(inode, wbc); if (!ret) ret = err; - if (wbc->sync_mode == WB_SYNC_HOLD) { - check_dirty_inode(inode); - inode->dirtied_when = jiffies; - list_move(&inode->i_list, &sb->s_dirty); - check_dirty_inode(inode); - } + if (wbc->sync_mode == WB_SYNC_HOLD) + queue_dirty_inode(inode); if (current_is_pdflush()) writeback_release(bdi); if (wbc->pages_skipped != pages_skipped) { @@ -536,7 +527,7 @@ int generic_sync_sb_inodes(struct super_ * writeback is not making progress due to locked * buffers. Skip this inode for now. */ - redirty_tail(inode); + queue_dirty_inode(inode); } spin_unlock(&inode_lock); iput(inode); -- ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20070810063419.786586676@mail.ustc.edu.cn>]
* [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() [not found] ` <20070810063419.786586676@mail.ustc.edu.cn> @ 2007-08-10 6:34 ` Fengguang Wu 2007-08-10 6:34 ` Fengguang Wu 1 sibling, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Anton Altaparmakov, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel [-- Attachment #1: nfs-dirty-inodes.patch --] [-- Type: text/plain, Size: 1453 bytes --] NTFS's if-condition on dirty inodes is not complete. Fix it with sb_has_dirty_inodes(). Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- --- linux-2.6.23-rc1-mm2.orig/fs/ntfs/super.c +++ linux-2.6.23-rc1-mm2/fs/ntfs/super.c @@ -2381,14 +2381,14 @@ static void ntfs_put_super(struct super_ */ ntfs_commit_inode(vol->mft_ino); write_inode_now(vol->mft_ino, 1); - if (!list_empty(&sb->s_dirty)) { + if (sb_has_dirty_inodes(sb)) { const char *s1, *s2; mutex_lock(&vol->mft_ino->i_mutex); truncate_inode_pages(vol->mft_ino->i_mapping, 0); mutex_unlock(&vol->mft_ino->i_mutex); write_inode_now(vol->mft_ino, 1); - if (!list_empty(&sb->s_dirty)) { + if (sb_has_dirty_inodes(sb)) { static const char *_s1 = "inodes"; static const char *_s2 = ""; s1 = _s1; --- linux-2.6.23-rc1-mm2.orig/include/linux/fs.h +++ linux-2.6.23-rc1-mm2/include/linux/fs.h @@ -1772,6 +1772,7 @@ extern int bdev_read_only(struct block_d extern int set_blocksize(struct block_device *, int); extern int sb_set_blocksize(struct super_block *, int); extern int sb_min_blocksize(struct super_block *, int); +extern int sb_has_dirty_inodes(struct super_block *); extern int generic_file_mmap(struct file *, struct vm_area_struct *); extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *); -- ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() [not found] ` <20070810063419.786586676@mail.ustc.edu.cn> 2007-08-10 6:34 ` [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu @ 2007-08-10 6:34 ` Fengguang Wu 1 sibling, 0 replies; 13+ messages in thread From: Fengguang Wu @ 2007-08-10 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Ken Chen, Anton Altaparmakov, Andrew Morton, Miklos Szeredi, linux-kernel, linux-fsdevel [-- Attachment #1: nfs-dirty-inodes.patch --] [-- Type: text/plain, Size: 1453 bytes --] NTFS's if-condition on dirty inodes is not complete. Fix it with sb_has_dirty_inodes(). Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> --- --- linux-2.6.23-rc1-mm2.orig/fs/ntfs/super.c +++ linux-2.6.23-rc1-mm2/fs/ntfs/super.c @@ -2381,14 +2381,14 @@ static void ntfs_put_super(struct super_ */ ntfs_commit_inode(vol->mft_ino); write_inode_now(vol->mft_ino, 1); - if (!list_empty(&sb->s_dirty)) { + if (sb_has_dirty_inodes(sb)) { const char *s1, *s2; mutex_lock(&vol->mft_ino->i_mutex); truncate_inode_pages(vol->mft_ino->i_mapping, 0); mutex_unlock(&vol->mft_ino->i_mutex); write_inode_now(vol->mft_ino, 1); - if (!list_empty(&sb->s_dirty)) { + if (sb_has_dirty_inodes(sb)) { static const char *_s1 = "inodes"; static const char *_s2 = ""; s1 = _s1; --- linux-2.6.23-rc1-mm2.orig/include/linux/fs.h +++ linux-2.6.23-rc1-mm2/include/linux/fs.h @@ -1772,6 +1772,7 @@ extern int bdev_read_only(struct block_d extern int set_blocksize(struct block_device *, int); extern int sb_set_blocksize(struct super_block *, int); extern int sb_min_blocksize(struct super_block *, int); +extern int sb_has_dirty_inodes(struct super_block *); extern int generic_file_mmap(struct file *, struct vm_area_struct *); extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *); -- ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-08-10 17:05 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070810063412.238042387@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810063419.454829766@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810063419.549052142@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 2/4] writeback: 3-queue based writeback schedule Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810164715.GA5508@mail.ustc.edu.cn>
2007-08-10 16:47 ` Fengguang Wu
2007-08-10 16:47 ` Fengguang Wu
[not found] ` <20070810170534.GA5137@mail.ustc.edu.cn>
2007-08-10 17:05 ` Fengguang Wu
[not found] ` <20070810063419.657077419@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 3/4] writeback: function renames and cleanups Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
[not found] ` <20070810063419.786586676@mail.ustc.edu.cn>
2007-08-10 6:34 ` [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
2007-08-10 6:34 ` Fengguang Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).