* [PATCH 0/2] writeback dirty inodes fixes
[not found] <20070811060202.690651223@mail.ustc.edu.cn>
@ 2007-08-11 6:02 ` Fengguang Wu
2007-08-11 6:02 ` Fengguang Wu
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2007-08-11 6:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, linux-fsdevel, linux-kernel
Andrew,
Now the patches are simplified and rebased to 2.6.23-rc2-mm2.
The following two patches should be put immediately after
writeback-fix-periodic-superblock-dirty-inode-flushing.patch:
writeback: fix time ordering of the per superblock inode lists 8
writeback: fix ntfs with sb_has_dirty_inodes()
Thank you,
Fengguang
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 0/2] writeback dirty inodes fixes
[not found] <20070811060202.690651223@mail.ustc.edu.cn>
2007-08-11 6:02 ` [PATCH 0/2] writeback dirty inodes fixes Fengguang Wu
@ 2007-08-11 6:02 ` Fengguang Wu
[not found] ` <20070811060614.981610077@mail.ustc.edu.cn>
` (2 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2007-08-11 6:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, linux-fsdevel, linux-kernel
Andrew,
Now the patches are simplified and rebased to 2.6.23-rc2-mm2.
The following two patches should be put immediately after
writeback-fix-periodic-superblock-dirty-inode-flushing.patch:
writeback: fix time ordering of the per superblock inode lists 8
writeback: fix ntfs with sb_has_dirty_inodes()
Thank you,
Fengguang
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] writeback: fix time ordering of the per superblock inode lists 8
[not found] ` <20070811060614.981610077@mail.ustc.edu.cn>
@ 2007-08-11 6:02 ` Fengguang Wu
2007-08-11 6:02 ` Fengguang Wu
1 sibling, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2007-08-11 6:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, Andrew Morton, linux-fsdevel, linux-kernel
[-- Attachment #1: inode-dirty-time-ordering-fix.patch --]
[-- Type: text/plain, Size: 2900 bytes --]
Fix the time ordering bug re-introduced by
writeback-fix-periodic-superblock-dirty-inode-flushing.patch.
It works by never move not-yet-expired dirty inodes from s_dirty to s_io,
*only to* move them back. The move-inodes-back-and-forth thing is a mess.
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -172,6 +172,23 @@ static void requeue_io(struct inode *ino
}
/*
+ * Queue expired dirty inodes for io.
+ */
+static void queue_io(struct super_block *sb,
+ unsigned long *older_than_this)
+{
+ while (!list_empty(&sb->s_dirty)) {
+ struct inode *inode = list_entry(sb->s_dirty.prev,
+ struct inode, i_list);
+ /* Was this inode dirtied too recently? */
+ if (older_than_this &&
+ time_after(inode->dirtied_when, *older_than_this))
+ break;
+ list_move(&inode->i_list, &sb->s_io);
+ }
+}
+
+/*
* Write a single inode's dirty pages and inode data out to disk.
* If `wait' is set, wait on the writeout.
*
@@ -295,10 +312,10 @@ __writeback_single_inode(struct inode *i
/*
* We're skipping this inode because it's locked, and we're not
- * doing writeback-for-data-integrity. Move it to the head of
- * s_dirty so that writeback can proceed with the other inodes
- * on s_io. We'll have another go at writing back this inode
- * when the s_dirty iodes get moved back onto s_io.
+ * doing writeback-for-data-integrity. Move it to s_more_io so
+ * that writeback can proceed with the other inodes on s_io.
+ * We'll have another go at writing back this inode when we
+ * completed a full scan of s_io.
*/
requeue_io(inode);
@@ -362,10 +379,8 @@ __writeback_single_inode(struct inode *i
static void
sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
{
- const unsigned long start = jiffies; /* livelock avoidance */
-
if (!wbc->for_kupdate || list_empty(&sb->s_io))
- list_splice_init(&sb->s_dirty, &sb->s_io);
+ queue_io(sb, wbc->older_than_this);
while (!list_empty(&sb->s_io)) {
struct inode *inode = list_entry(sb->s_io.prev,
@@ -406,17 +421,6 @@ sync_sb_inodes(struct super_block *sb, s
continue; /* blockdev has wrong queue */
}
- /* Was this inode dirtied after sync_sb_inodes was called? */
- if (time_after(inode->dirtied_when, start))
- break;
-
- /* Was this inode dirtied too recently? */
- if (wbc->older_than_this && time_after(inode->dirtied_when,
- *wbc->older_than_this)) {
- list_splice_init(&sb->s_io, sb->s_dirty.prev);
- break;
- }
-
/* Is another pdflush already flushing this queue? */
if (current_is_pdflush() && !writeback_acquire(bdi))
break;
--
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] writeback: fix time ordering of the per superblock inode lists 8
[not found] ` <20070811060614.981610077@mail.ustc.edu.cn>
2007-08-11 6:02 ` [PATCH 1/2] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
@ 2007-08-11 6:02 ` Fengguang Wu
1 sibling, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2007-08-11 6:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, Andrew Morton, linux-fsdevel, linux-kernel
[-- Attachment #1: inode-dirty-time-ordering-fix.patch --]
[-- Type: text/plain, Size: 2900 bytes --]
Fix the time ordering bug re-introduced by
writeback-fix-periodic-superblock-dirty-inode-flushing.patch.
It works by never move not-yet-expired dirty inodes from s_dirty to s_io,
*only to* move them back. The move-inodes-back-and-forth thing is a mess.
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
fs/fs-writeback.c | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -172,6 +172,23 @@ static void requeue_io(struct inode *ino
}
/*
+ * Queue expired dirty inodes for io.
+ */
+static void queue_io(struct super_block *sb,
+ unsigned long *older_than_this)
+{
+ while (!list_empty(&sb->s_dirty)) {
+ struct inode *inode = list_entry(sb->s_dirty.prev,
+ struct inode, i_list);
+ /* Was this inode dirtied too recently? */
+ if (older_than_this &&
+ time_after(inode->dirtied_when, *older_than_this))
+ break;
+ list_move(&inode->i_list, &sb->s_io);
+ }
+}
+
+/*
* Write a single inode's dirty pages and inode data out to disk.
* If `wait' is set, wait on the writeout.
*
@@ -295,10 +312,10 @@ __writeback_single_inode(struct inode *i
/*
* We're skipping this inode because it's locked, and we're not
- * doing writeback-for-data-integrity. Move it to the head of
- * s_dirty so that writeback can proceed with the other inodes
- * on s_io. We'll have another go at writing back this inode
- * when the s_dirty iodes get moved back onto s_io.
+ * doing writeback-for-data-integrity. Move it to s_more_io so
+ * that writeback can proceed with the other inodes on s_io.
+ * We'll have another go at writing back this inode when we
+ * completed a full scan of s_io.
*/
requeue_io(inode);
@@ -362,10 +379,8 @@ __writeback_single_inode(struct inode *i
static void
sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
{
- const unsigned long start = jiffies; /* livelock avoidance */
-
if (!wbc->for_kupdate || list_empty(&sb->s_io))
- list_splice_init(&sb->s_dirty, &sb->s_io);
+ queue_io(sb, wbc->older_than_this);
while (!list_empty(&sb->s_io)) {
struct inode *inode = list_entry(sb->s_io.prev,
@@ -406,17 +421,6 @@ sync_sb_inodes(struct super_block *sb, s
continue; /* blockdev has wrong queue */
}
- /* Was this inode dirtied after sync_sb_inodes was called? */
- if (time_after(inode->dirtied_when, start))
- break;
-
- /* Was this inode dirtied too recently? */
- if (wbc->older_than_this && time_after(inode->dirtied_when,
- *wbc->older_than_this)) {
- list_splice_init(&sb->s_io, sb->s_dirty.prev);
- break;
- }
-
/* Is another pdflush already flushing this queue? */
if (current_is_pdflush() && !writeback_acquire(bdi))
break;
--
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] writeback: fix ntfs with sb_has_dirty_inodes()
[not found] ` <20070811060615.118166956@mail.ustc.edu.cn>
@ 2007-08-11 6:02 ` Fengguang Wu
2007-08-11 6:02 ` Fengguang Wu
1 sibling, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2007-08-11 6:02 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
linux-kernel
[-- Attachment #1: nfs-dirty-inodes.patch --]
[-- Type: text/plain, Size: 2418 bytes --]
NTFS's if-condition on dirty inodes is not complete.
Fix it with sb_has_dirty_inodes().
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
---
fs/fs-writeback.c | 9 ++++++++-
fs/ntfs/super.c | 4 ++--
include/linux/fs.h | 1 +
3 files changed, 11 insertions(+), 3 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/ntfs/super.c
+++ linux-2.6.23-rc2-mm2/fs/ntfs/super.c
@@ -2381,14 +2381,14 @@ static void ntfs_put_super(struct super_
*/
ntfs_commit_inode(vol->mft_ino);
write_inode_now(vol->mft_ino, 1);
- if (!list_empty(&sb->s_dirty)) {
+ if (sb_has_dirty_inodes(sb)) {
const char *s1, *s2;
mutex_lock(&vol->mft_ino->i_mutex);
truncate_inode_pages(vol->mft_ino->i_mapping, 0);
mutex_unlock(&vol->mft_ino->i_mutex);
write_inode_now(vol->mft_ino, 1);
- if (!list_empty(&sb->s_dirty)) {
+ if (sb_has_dirty_inodes(sb)) {
static const char *_s1 = "inodes";
static const char *_s2 = "";
s1 = _s1;
--- linux-2.6.23-rc2-mm2.orig/include/linux/fs.h
+++ linux-2.6.23-rc2-mm2/include/linux/fs.h
@@ -1712,6 +1712,7 @@ extern int bdev_read_only(struct block_d
extern int set_blocksize(struct block_device *, int);
extern int sb_set_blocksize(struct super_block *, int);
extern int sb_min_blocksize(struct super_block *, int);
+extern int sb_has_dirty_inodes(struct super_block *);
extern int generic_file_mmap(struct file *, struct vm_area_struct *);
extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -188,6 +188,13 @@ static void queue_io(struct super_block
}
}
+int sb_has_dirty_inodes(struct super_block *sb)
+{
+ return !list_empty(&sb->s_dirty) ||
+ !list_empty(&sb->s_io);
+}
+EXPORT_SYMBOL(sb_has_dirty_inodes);
+
/*
* Write a single inode's dirty pages and inode data out to disk.
* If `wait' is set, wait on the writeout.
@@ -485,7 +492,7 @@ writeback_inodes(struct writeback_contro
restart:
sb = sb_entry(super_blocks.prev);
for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.prev)) {
- if (!list_empty(&sb->s_dirty) || !list_empty(&sb->s_io)) {
+ if (sb_has_dirty_inodes(sb)) {
/* we're making our own get_super here */
sb->s_count++;
spin_unlock(&sb_lock);
--
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] writeback: fix ntfs with sb_has_dirty_inodes()
[not found] ` <20070811060615.118166956@mail.ustc.edu.cn>
2007-08-11 6:02 ` [PATCH 2/2] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
@ 2007-08-11 6:02 ` Fengguang Wu
1 sibling, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2007-08-11 6:02 UTC (permalink / raw)
To: Andrew Morton
Cc: Ken Chen, Anton Altaparmakov, Andrew Morton, linux-fsdevel,
linux-kernel
[-- Attachment #1: nfs-dirty-inodes.patch --]
[-- Type: text/plain, Size: 2418 bytes --]
NTFS's if-condition on dirty inodes is not complete.
Fix it with sb_has_dirty_inodes().
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
---
fs/fs-writeback.c | 9 ++++++++-
fs/ntfs/super.c | 4 ++--
include/linux/fs.h | 1 +
3 files changed, 11 insertions(+), 3 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/ntfs/super.c
+++ linux-2.6.23-rc2-mm2/fs/ntfs/super.c
@@ -2381,14 +2381,14 @@ static void ntfs_put_super(struct super_
*/
ntfs_commit_inode(vol->mft_ino);
write_inode_now(vol->mft_ino, 1);
- if (!list_empty(&sb->s_dirty)) {
+ if (sb_has_dirty_inodes(sb)) {
const char *s1, *s2;
mutex_lock(&vol->mft_ino->i_mutex);
truncate_inode_pages(vol->mft_ino->i_mapping, 0);
mutex_unlock(&vol->mft_ino->i_mutex);
write_inode_now(vol->mft_ino, 1);
- if (!list_empty(&sb->s_dirty)) {
+ if (sb_has_dirty_inodes(sb)) {
static const char *_s1 = "inodes";
static const char *_s2 = "";
s1 = _s1;
--- linux-2.6.23-rc2-mm2.orig/include/linux/fs.h
+++ linux-2.6.23-rc2-mm2/include/linux/fs.h
@@ -1712,6 +1712,7 @@ extern int bdev_read_only(struct block_d
extern int set_blocksize(struct block_device *, int);
extern int sb_set_blocksize(struct super_block *, int);
extern int sb_min_blocksize(struct super_block *, int);
+extern int sb_has_dirty_inodes(struct super_block *);
extern int generic_file_mmap(struct file *, struct vm_area_struct *);
extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -188,6 +188,13 @@ static void queue_io(struct super_block
}
}
+int sb_has_dirty_inodes(struct super_block *sb)
+{
+ return !list_empty(&sb->s_dirty) ||
+ !list_empty(&sb->s_io);
+}
+EXPORT_SYMBOL(sb_has_dirty_inodes);
+
/*
* Write a single inode's dirty pages and inode data out to disk.
* If `wait' is set, wait on the writeout.
@@ -485,7 +492,7 @@ writeback_inodes(struct writeback_contro
restart:
sb = sb_entry(super_blocks.prev);
for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.prev)) {
- if (!list_empty(&sb->s_dirty) || !list_empty(&sb->s_io)) {
+ if (sb_has_dirty_inodes(sb)) {
/* we're making our own get_super here */
sb->s_count++;
spin_unlock(&sb_lock);
--
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] writeback dirty inodes fixes
[not found] ` <20070811061406.GA32051@mail.ustc.edu.cn>
2007-08-11 6:14 ` [PATCH 0/2] writeback dirty inodes fixes Fengguang Wu
@ 2007-08-11 6:14 ` Fengguang Wu
1 sibling, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2007-08-11 6:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, linux-fsdevel, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 675 bytes --]
On Sat, Aug 11, 2007 at 02:02:02PM +0800, Fengguang Wu wrote:
> Andrew,
>
> Now the patches are simplified and rebased to 2.6.23-rc2-mm2.
>
> The following two patches should be put immediately after
> writeback-fix-periodic-superblock-dirty-inode-flushing.patch:
>
> writeback: fix time ordering of the per superblock inode lists 8
> writeback: fix ntfs with sb_has_dirty_inodes()
The following tree patches should be updated to resolve merge conflicts:
sync_sb_inodes-propagate-errors.patch
reiser4-sb_sync_inodes.patch
check_dirty_inode_list.patch (extended to check s_io/s_more_io)
They are attached in this mail.
[-- Attachment #2: sync_sb_inodes-propagate-errors.patch --]
[-- Type: text/x-diff, Size: 5383 bytes --]
From: Andrew Morton <akpm@osdl.org>
Guillame points out that sync_sb_inodes() is failing to propagate error codes
back. Fix that, and make several other void-returning functions not drop
reportable error codes.
Cc: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 56 +++++++++++++++++++++++++++---------
include/linux/writeback.h | 6 +--
2 files changed, 45 insertions(+), 17 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -392,13 +392,17 @@ __writeback_single_inode(struct inode *i
* on the writer throttling path, and we get decent balancing between many
* throttled threads: we don't want them all piling up on inode_sync_wait.
*/
-static void
+static int
sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
{
+ int ret = 0;
+
if (!wbc->for_kupdate || list_empty(&sb->s_io))
queue_io(sb, wbc->older_than_this);
while (!list_empty(&sb->s_io)) {
+ int err;
+
struct inode *inode = list_entry(sb->s_io.prev,
struct inode, i_list);
struct address_space *mapping = inode->i_mapping;
@@ -444,7 +448,9 @@ sync_sb_inodes(struct super_block *sb, s
BUG_ON(inode->i_state & I_FREEING);
__iget(inode);
pages_skipped = wbc->pages_skipped;
- __writeback_single_inode(inode, wbc);
+ err = __writeback_single_inode(inode, wbc);
+ if (!ret)
+ ret = err;
if (wbc->sync_mode == WB_SYNC_HOLD) {
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
@@ -469,7 +475,7 @@ sync_sb_inodes(struct super_block *sb, s
if (list_empty(&sb->s_io))
list_splice_init(&sb->s_more_io, &sb->s_io);
- return; /* Leave any unwritten inodes on s_io */
+ return ret; /* Leave any unwritten inodes on s_io */
}
/*
@@ -491,10 +497,10 @@ sync_sb_inodes(struct super_block *sb, s
* sync_sb_inodes will seekout the blockdev which matches `bdi'. Maybe not
* super-efficient but we're about to do a ton of I/O...
*/
-void
-writeback_inodes(struct writeback_control *wbc)
+int writeback_inodes(struct writeback_control *wbc)
{
struct super_block *sb;
+ int ret = 0;
might_sleep();
spin_lock(&sb_lock);
@@ -512,9 +518,13 @@ restart:
*/
if (down_read_trylock(&sb->s_umount)) {
if (sb->s_root) {
+ int err;
+
spin_lock(&inode_lock);
- sync_sb_inodes(sb, wbc);
+ err = sync_sb_inodes(sb, wbc);
spin_unlock(&inode_lock);
+ if (!ret)
+ ret = err;
}
up_read(&sb->s_umount);
}
@@ -526,6 +536,7 @@ restart:
break;
}
spin_unlock(&sb_lock);
+ return ret;
}
/*
@@ -539,7 +550,7 @@ restart:
* We add in the number of potentially dirty inodes, because each inode write
* can dirty pagecache in the underlying blockdev.
*/
-void sync_inodes_sb(struct super_block *sb, int wait)
+int sync_inodes_sb(struct super_block *sb, int wait)
{
struct writeback_control wbc = {
.sync_mode = wait ? WB_SYNC_ALL : WB_SYNC_HOLD,
@@ -548,14 +559,16 @@ void sync_inodes_sb(struct super_block *
};
unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
+ int ret;
wbc.nr_to_write = nr_dirty + nr_unstable +
(inodes_stat.nr_inodes - inodes_stat.nr_unused) +
nr_dirty + nr_unstable;
wbc.nr_to_write += wbc.nr_to_write / 2; /* Bit more for luck */
spin_lock(&inode_lock);
- sync_sb_inodes(sb, &wbc);
+ ret = sync_sb_inodes(sb, &wbc);
spin_unlock(&inode_lock);
+ return ret;
}
/*
@@ -591,13 +604,16 @@ static void set_sb_syncing(int val)
* outstanding dirty inodes, the writeback goes block-at-a-time within the
* filesystem's write_inode(). This is extremely slow.
*/
-static void __sync_inodes(int wait)
+static int __sync_inodes(int wait)
{
struct super_block *sb;
+ int ret = 0;
spin_lock(&sb_lock);
restart:
list_for_each_entry(sb, &super_blocks, s_list) {
+ int err;
+
if (sb->s_syncing)
continue;
sb->s_syncing = 1;
@@ -605,8 +621,12 @@ restart:
spin_unlock(&sb_lock);
down_read(&sb->s_umount);
if (sb->s_root) {
- sync_inodes_sb(sb, wait);
- sync_blockdev(sb->s_bdev);
+ err = sync_inodes_sb(sb, wait);
+ if (!ret)
+ ret = err;
+ err = sync_blockdev(sb->s_bdev);
+ if (!ret)
+ ret = err;
}
up_read(&sb->s_umount);
spin_lock(&sb_lock);
@@ -614,17 +634,25 @@ restart:
goto restart;
}
spin_unlock(&sb_lock);
+ return ret;
}
-void sync_inodes(int wait)
+int sync_inodes(int wait)
{
+ int ret;
+
set_sb_syncing(0);
- __sync_inodes(0);
+ ret = __sync_inodes(0);
if (wait) {
+ int err;
+
set_sb_syncing(0);
- __sync_inodes(1);
+ err = __sync_inodes(1);
+ if (!ret)
+ ret = err;
}
+ return ret;
}
/**
--- linux-2.6.23-rc2-mm2.orig/include/linux/writeback.h
+++ linux-2.6.23-rc2-mm2/include/linux/writeback.h
@@ -68,10 +68,10 @@ struct writeback_control {
/*
* fs/fs-writeback.c
*/
-void writeback_inodes(struct writeback_control *wbc);
+int writeback_inodes(struct writeback_control *wbc);
int inode_wait(void *);
-void sync_inodes_sb(struct super_block *, int wait);
-void sync_inodes(int wait);
+int sync_inodes_sb(struct super_block *, int wait);
+int sync_inodes(int wait);
/* writeback.h requires fs.h; it, too, is not included from here. */
static inline void wait_on_inode(struct inode *inode)
[-- Attachment #3: reiser4-sb_sync_inodes.patch --]
[-- Type: text/x-diff, Size: 4704 bytes --]
From: Hans Reiser <reiser@namesys.com>
This patch adds new operation to struct super_operations - sync_inodes,
generic implementaion and changes fs-writeback.c:sync_sb_inodes() to call
filesystem's sync_inodes if it is defined or generic implementaion otherwise.
This new operation allows filesystem to decide itself what to flush.
Reiser4 flushes dirty pages on basic of atoms, not of inodes. sync_sb_inodes
used to call address space flushing method (writepages) for every dirty inode.
For reiser4 it caused having to commit atoms unnecessarily often. This
turned into substantial slowdown. Having this method helped to fix that
problem.
Also, make generic_sync_sb_inodes spin lock itself. It helps reiser4 to
get rid of some oddities.
sync_sb_inodes is always called like:
spin_lock(&inode_lock);
sync_sb_inodes(sb, wbc);
spin_unlock(&inode_lock);
This patch moves spin_lock/spin_unlock down to sync_sb_inodes.
[deweerdt@free.fr: lockdep: unbalance at generic_sync_sb_inodes]
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 31 ++++++++++++++++---------------
include/linux/fs.h | 3 +++
2 files changed, 19 insertions(+), 15 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -375,8 +375,6 @@ __writeback_single_inode(struct inode *i
* WB_SYNC_HOLD is a hack for sys_sync(): reattach the inode to sb->s_dirty so
* that it can be located for waiting on in __writeback_single_inode().
*
- * Called under inode_lock.
- *
* If `bdi' is non-zero then we're being asked to writeback a specific queue.
* This function assumes that the blockdev superblock's inodes are backed by
* a variety of queues, so all inodes are searched. For other superblocks,
@@ -392,11 +390,13 @@ __writeback_single_inode(struct inode *i
* on the writer throttling path, and we get decent balancing between many
* throttled threads: we don't want them all piling up on inode_sync_wait.
*/
-static int
-sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
+int generic_sync_sb_inodes(struct super_block *sb,
+ struct writeback_control *wbc)
{
int ret = 0;
+ spin_lock(&inode_lock);
+
if (!wbc->for_kupdate || list_empty(&sb->s_io))
queue_io(sb, wbc->older_than_this);
@@ -474,9 +474,18 @@ sync_sb_inodes(struct super_block *sb, s
if (list_empty(&sb->s_io))
list_splice_init(&sb->s_more_io, &sb->s_io);
-
+ spin_unlock(&inode_lock);
return ret; /* Leave any unwritten inodes on s_io */
}
+EXPORT_SYMBOL(generic_sync_sb_inodes);
+
+static int sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
+{
+ if (sb->s_op->sync_inodes)
+ return sb->s_op->sync_inodes(sb, wbc);
+ else
+ return generic_sync_sb_inodes(sb, wbc);
+}
/*
* Start writeback of dirty pagecache data against all unlocked inodes.
@@ -518,11 +527,7 @@ restart:
*/
if (down_read_trylock(&sb->s_umount)) {
if (sb->s_root) {
- int err;
-
- spin_lock(&inode_lock);
- err = sync_sb_inodes(sb, wbc);
- spin_unlock(&inode_lock);
+ int err = sync_sb_inodes(sb, wbc);
if (!ret)
ret = err;
}
@@ -559,16 +564,12 @@ int sync_inodes_sb(struct super_block *s
};
unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
- int ret;
wbc.nr_to_write = nr_dirty + nr_unstable +
(inodes_stat.nr_inodes - inodes_stat.nr_unused) +
nr_dirty + nr_unstable;
wbc.nr_to_write += wbc.nr_to_write / 2; /* Bit more for luck */
- spin_lock(&inode_lock);
- ret = sync_sb_inodes(sb, &wbc);
- spin_unlock(&inode_lock);
- return ret;
+ return sync_sb_inodes(sb, &wbc);
}
/*
--- linux-2.6.23-rc2-mm2.orig/include/linux/fs.h
+++ linux-2.6.23-rc2-mm2/include/linux/fs.h
@@ -1260,6 +1260,8 @@ struct super_operations {
void (*clear_inode) (struct inode *);
void (*umount_begin) (struct vfsmount *, int);
+ int (*sync_inodes) (struct super_block *sb,
+ struct writeback_control *wbc);
int (*show_options)(struct seq_file *, struct vfsmount *);
int (*show_stats)(struct seq_file *, struct vfsmount *);
#ifdef CONFIG_QUOTA
@@ -1654,6 +1656,7 @@ extern int invalidate_inode_pages2(struc
extern int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
extern int write_inode_now(struct inode *, int);
+extern int generic_sync_sb_inodes(struct super_block *, struct writeback_control *);
extern int filemap_fdatawrite(struct address_space *);
extern int filemap_flush(struct address_space *);
extern int filemap_fdatawait(struct address_space *);
[-- Attachment #4: check_dirty_inode_list.patch --]
[-- Type: text/x-diff, Size: 5731 bytes --]
From: Andrew Morton <akpm@linux-foundation.org>
The per-superblock dirty-inode list super_block.s_dirty is supposed to be
sorted in reverse order of each inode's time-of-first-dirtying. This is so
that the kupdate function can avoid having to walk all the dirty inodes on the
list: it terminates the search as soon as it finds an inode which was dirtied
less than 30 seconds ago (dirty_expire_centisecs).
We have a bunch of several-year-old bugs which cause that list to not be in
the correct reverse-time-order. The result of this is that under certain
obscure circumstances, inodes get stuck and basically never get written back.
It has been reported a couple of times, but nobody really cared much because
most people use ordered-mode journalling filesystems, which take care of the
writeback independently. Plus we will _eventually_ get onto these inodes even
when the list is out of order, and a /bin/sync will still work OK.
However this is a pretty important data-integrity issue for filesystems such
as ext2.
As preparation for fixing these bugs, this patch adds a pile of fantastically
expensive debugging code which checks the sanity of the s_dirty list all over
the place, so we find out as soon as it goes bad.
The debugging code is controlled by /proc/sys/fs/inode_debug, which defaults
to off. The debugging will disable itself whenever it detects a misordering,
to avoid log spew.
We can remove all this code later.
Cc: Mike Waychison <mikew@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 79 +++++++++++++++++++++++++++++++++++-
include/linux/writeback.h | 1
kernel/sysctl.c | 8 +++
3 files changed, 86 insertions(+), 2 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -24,6 +24,75 @@
#include <linux/buffer_head.h>
#include "internal.h"
+int sysctl_inode_debug __read_mostly;
+
+static int __check(struct list_head *head, int print_stuff)
+{
+ struct list_head *cursor = head;
+ unsigned long dirtied_when = 0;
+
+ while ((cursor = cursor->prev) != head) {
+ struct inode *inode = list_entry(cursor, struct inode, i_list);
+ if (print_stuff) {
+ printk("%p:%lu\n", inode, inode->dirtied_when);
+ } else {
+ if (dirtied_when &&
+ time_before(inode->dirtied_when, dirtied_when))
+ return 1;
+ dirtied_when = inode->dirtied_when;
+ }
+ }
+ return 0;
+}
+
+static void __check_dirty_inode_list(struct super_block *sb,
+ struct inode *inode, const char *file, int line)
+{
+ if (!sysctl_inode_debug)
+ return;
+
+ if (__check(&sb->s_dirty, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_dirty got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_dirty got screwed up\n", file, line);
+ __check(&sb->s_dirty, 1);
+ }
+ if (__check(&sb->s_io, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_io got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_io got screwed up\n", file, line);
+ __check(&sb->s_io, 1);
+ }
+ if (__check(&sb->s_more_io, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_more_io got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_more_io got screwed up\n", file, line);
+ __check(&sb->s_more_io, 1);
+ }
+}
+
+#define check_dirty_inode_list(sb) \
+ do { \
+ if (unlikely(sysctl_inode_debug)) \
+ __check_dirty_inode_list(sb, NULL, __FILE__, __LINE__); \
+ } while (0)
+
+#define check_dirty_inode(inode) \
+ do { \
+ if (unlikely(sysctl_inode_debug)) \
+ __check_dirty_inode_list(inode->i_sb, inode, \
+ __FILE__, __LINE__); \
+ } while (0)
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -122,8 +191,10 @@ void __mark_inode_dirty(struct inode *in
* reposition it (that would break s_dirty time-ordering).
*/
if (!was_dirty) {
+ check_dirty_inode(inode);
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
}
out:
@@ -152,6 +223,7 @@ static void redirty_tail(struct inode *i
{
struct super_block *sb = inode->i_sb;
+ check_dirty_inode(inode);
if (!list_empty(&sb->s_dirty)) {
struct inode *tail_inode;
@@ -161,6 +233,7 @@ static void redirty_tail(struct inode *i
inode->dirtied_when = jiffies;
}
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
/*
@@ -452,8 +525,10 @@ int generic_sync_sb_inodes(struct super_
if (!ret)
ret = err;
if (wbc->sync_mode == WB_SYNC_HOLD) {
+ check_dirty_inode(inode);
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
if (current_is_pdflush())
writeback_release(bdi);
--- linux-2.6.23-rc2-mm2.orig/include/linux/writeback.h
+++ linux-2.6.23-rc2-mm2/include/linux/writeback.h
@@ -140,5 +140,6 @@ void writeback_set_ratelimit(void);
extern int nr_pdflush_threads; /* Global so it can be exported to sysctl
read-only. */
+extern int sysctl_inode_debug;
#endif /* WRITEBACK_H */
--- linux-2.6.23-rc2-mm2.orig/kernel/sysctl.c
+++ linux-2.6.23-rc2-mm2/kernel/sysctl.c
@@ -1238,6 +1238,14 @@ static struct ctl_table fs_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "inode_debug",
+ .data = &sysctl_inode_debug,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
{
.ctl_name = CTL_UNNUMBERED,
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] writeback dirty inodes fixes
[not found] ` <20070811061406.GA32051@mail.ustc.edu.cn>
@ 2007-08-11 6:14 ` Fengguang Wu
2007-08-11 6:14 ` Fengguang Wu
1 sibling, 0 replies; 8+ messages in thread
From: Fengguang Wu @ 2007-08-11 6:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, linux-fsdevel, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 675 bytes --]
On Sat, Aug 11, 2007 at 02:02:02PM +0800, Fengguang Wu wrote:
> Andrew,
>
> Now the patches are simplified and rebased to 2.6.23-rc2-mm2.
>
> The following two patches should be put immediately after
> writeback-fix-periodic-superblock-dirty-inode-flushing.patch:
>
> writeback: fix time ordering of the per superblock inode lists 8
> writeback: fix ntfs with sb_has_dirty_inodes()
The following tree patches should be updated to resolve merge conflicts:
sync_sb_inodes-propagate-errors.patch
reiser4-sb_sync_inodes.patch
check_dirty_inode_list.patch (extended to check s_io/s_more_io)
They are attached in this mail.
[-- Attachment #2: sync_sb_inodes-propagate-errors.patch --]
[-- Type: text/x-diff, Size: 5383 bytes --]
From: Andrew Morton <akpm@osdl.org>
Guillame points out that sync_sb_inodes() is failing to propagate error codes
back. Fix that, and make several other void-returning functions not drop
reportable error codes.
Cc: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 56 +++++++++++++++++++++++++++---------
include/linux/writeback.h | 6 +--
2 files changed, 45 insertions(+), 17 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -392,13 +392,17 @@ __writeback_single_inode(struct inode *i
* on the writer throttling path, and we get decent balancing between many
* throttled threads: we don't want them all piling up on inode_sync_wait.
*/
-static void
+static int
sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
{
+ int ret = 0;
+
if (!wbc->for_kupdate || list_empty(&sb->s_io))
queue_io(sb, wbc->older_than_this);
while (!list_empty(&sb->s_io)) {
+ int err;
+
struct inode *inode = list_entry(sb->s_io.prev,
struct inode, i_list);
struct address_space *mapping = inode->i_mapping;
@@ -444,7 +448,9 @@ sync_sb_inodes(struct super_block *sb, s
BUG_ON(inode->i_state & I_FREEING);
__iget(inode);
pages_skipped = wbc->pages_skipped;
- __writeback_single_inode(inode, wbc);
+ err = __writeback_single_inode(inode, wbc);
+ if (!ret)
+ ret = err;
if (wbc->sync_mode == WB_SYNC_HOLD) {
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
@@ -469,7 +475,7 @@ sync_sb_inodes(struct super_block *sb, s
if (list_empty(&sb->s_io))
list_splice_init(&sb->s_more_io, &sb->s_io);
- return; /* Leave any unwritten inodes on s_io */
+ return ret; /* Leave any unwritten inodes on s_io */
}
/*
@@ -491,10 +497,10 @@ sync_sb_inodes(struct super_block *sb, s
* sync_sb_inodes will seekout the blockdev which matches `bdi'. Maybe not
* super-efficient but we're about to do a ton of I/O...
*/
-void
-writeback_inodes(struct writeback_control *wbc)
+int writeback_inodes(struct writeback_control *wbc)
{
struct super_block *sb;
+ int ret = 0;
might_sleep();
spin_lock(&sb_lock);
@@ -512,9 +518,13 @@ restart:
*/
if (down_read_trylock(&sb->s_umount)) {
if (sb->s_root) {
+ int err;
+
spin_lock(&inode_lock);
- sync_sb_inodes(sb, wbc);
+ err = sync_sb_inodes(sb, wbc);
spin_unlock(&inode_lock);
+ if (!ret)
+ ret = err;
}
up_read(&sb->s_umount);
}
@@ -526,6 +536,7 @@ restart:
break;
}
spin_unlock(&sb_lock);
+ return ret;
}
/*
@@ -539,7 +550,7 @@ restart:
* We add in the number of potentially dirty inodes, because each inode write
* can dirty pagecache in the underlying blockdev.
*/
-void sync_inodes_sb(struct super_block *sb, int wait)
+int sync_inodes_sb(struct super_block *sb, int wait)
{
struct writeback_control wbc = {
.sync_mode = wait ? WB_SYNC_ALL : WB_SYNC_HOLD,
@@ -548,14 +559,16 @@ void sync_inodes_sb(struct super_block *
};
unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
+ int ret;
wbc.nr_to_write = nr_dirty + nr_unstable +
(inodes_stat.nr_inodes - inodes_stat.nr_unused) +
nr_dirty + nr_unstable;
wbc.nr_to_write += wbc.nr_to_write / 2; /* Bit more for luck */
spin_lock(&inode_lock);
- sync_sb_inodes(sb, &wbc);
+ ret = sync_sb_inodes(sb, &wbc);
spin_unlock(&inode_lock);
+ return ret;
}
/*
@@ -591,13 +604,16 @@ static void set_sb_syncing(int val)
* outstanding dirty inodes, the writeback goes block-at-a-time within the
* filesystem's write_inode(). This is extremely slow.
*/
-static void __sync_inodes(int wait)
+static int __sync_inodes(int wait)
{
struct super_block *sb;
+ int ret = 0;
spin_lock(&sb_lock);
restart:
list_for_each_entry(sb, &super_blocks, s_list) {
+ int err;
+
if (sb->s_syncing)
continue;
sb->s_syncing = 1;
@@ -605,8 +621,12 @@ restart:
spin_unlock(&sb_lock);
down_read(&sb->s_umount);
if (sb->s_root) {
- sync_inodes_sb(sb, wait);
- sync_blockdev(sb->s_bdev);
+ err = sync_inodes_sb(sb, wait);
+ if (!ret)
+ ret = err;
+ err = sync_blockdev(sb->s_bdev);
+ if (!ret)
+ ret = err;
}
up_read(&sb->s_umount);
spin_lock(&sb_lock);
@@ -614,17 +634,25 @@ restart:
goto restart;
}
spin_unlock(&sb_lock);
+ return ret;
}
-void sync_inodes(int wait)
+int sync_inodes(int wait)
{
+ int ret;
+
set_sb_syncing(0);
- __sync_inodes(0);
+ ret = __sync_inodes(0);
if (wait) {
+ int err;
+
set_sb_syncing(0);
- __sync_inodes(1);
+ err = __sync_inodes(1);
+ if (!ret)
+ ret = err;
}
+ return ret;
}
/**
--- linux-2.6.23-rc2-mm2.orig/include/linux/writeback.h
+++ linux-2.6.23-rc2-mm2/include/linux/writeback.h
@@ -68,10 +68,10 @@ struct writeback_control {
/*
* fs/fs-writeback.c
*/
-void writeback_inodes(struct writeback_control *wbc);
+int writeback_inodes(struct writeback_control *wbc);
int inode_wait(void *);
-void sync_inodes_sb(struct super_block *, int wait);
-void sync_inodes(int wait);
+int sync_inodes_sb(struct super_block *, int wait);
+int sync_inodes(int wait);
/* writeback.h requires fs.h; it, too, is not included from here. */
static inline void wait_on_inode(struct inode *inode)
[-- Attachment #3: reiser4-sb_sync_inodes.patch --]
[-- Type: text/x-diff, Size: 4704 bytes --]
From: Hans Reiser <reiser@namesys.com>
This patch adds new operation to struct super_operations - sync_inodes,
generic implementaion and changes fs-writeback.c:sync_sb_inodes() to call
filesystem's sync_inodes if it is defined or generic implementaion otherwise.
This new operation allows filesystem to decide itself what to flush.
Reiser4 flushes dirty pages on basic of atoms, not of inodes. sync_sb_inodes
used to call address space flushing method (writepages) for every dirty inode.
For reiser4 it caused having to commit atoms unnecessarily often. This
turned into substantial slowdown. Having this method helped to fix that
problem.
Also, make generic_sync_sb_inodes spin lock itself. It helps reiser4 to
get rid of some oddities.
sync_sb_inodes is always called like:
spin_lock(&inode_lock);
sync_sb_inodes(sb, wbc);
spin_unlock(&inode_lock);
This patch moves spin_lock/spin_unlock down to sync_sb_inodes.
[deweerdt@free.fr: lockdep: unbalance at generic_sync_sb_inodes]
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 31 ++++++++++++++++---------------
include/linux/fs.h | 3 +++
2 files changed, 19 insertions(+), 15 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -375,8 +375,6 @@ __writeback_single_inode(struct inode *i
* WB_SYNC_HOLD is a hack for sys_sync(): reattach the inode to sb->s_dirty so
* that it can be located for waiting on in __writeback_single_inode().
*
- * Called under inode_lock.
- *
* If `bdi' is non-zero then we're being asked to writeback a specific queue.
* This function assumes that the blockdev superblock's inodes are backed by
* a variety of queues, so all inodes are searched. For other superblocks,
@@ -392,11 +390,13 @@ __writeback_single_inode(struct inode *i
* on the writer throttling path, and we get decent balancing between many
* throttled threads: we don't want them all piling up on inode_sync_wait.
*/
-static int
-sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
+int generic_sync_sb_inodes(struct super_block *sb,
+ struct writeback_control *wbc)
{
int ret = 0;
+ spin_lock(&inode_lock);
+
if (!wbc->for_kupdate || list_empty(&sb->s_io))
queue_io(sb, wbc->older_than_this);
@@ -474,9 +474,18 @@ sync_sb_inodes(struct super_block *sb, s
if (list_empty(&sb->s_io))
list_splice_init(&sb->s_more_io, &sb->s_io);
-
+ spin_unlock(&inode_lock);
return ret; /* Leave any unwritten inodes on s_io */
}
+EXPORT_SYMBOL(generic_sync_sb_inodes);
+
+static int sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
+{
+ if (sb->s_op->sync_inodes)
+ return sb->s_op->sync_inodes(sb, wbc);
+ else
+ return generic_sync_sb_inodes(sb, wbc);
+}
/*
* Start writeback of dirty pagecache data against all unlocked inodes.
@@ -518,11 +527,7 @@ restart:
*/
if (down_read_trylock(&sb->s_umount)) {
if (sb->s_root) {
- int err;
-
- spin_lock(&inode_lock);
- err = sync_sb_inodes(sb, wbc);
- spin_unlock(&inode_lock);
+ int err = sync_sb_inodes(sb, wbc);
if (!ret)
ret = err;
}
@@ -559,16 +564,12 @@ int sync_inodes_sb(struct super_block *s
};
unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
- int ret;
wbc.nr_to_write = nr_dirty + nr_unstable +
(inodes_stat.nr_inodes - inodes_stat.nr_unused) +
nr_dirty + nr_unstable;
wbc.nr_to_write += wbc.nr_to_write / 2; /* Bit more for luck */
- spin_lock(&inode_lock);
- ret = sync_sb_inodes(sb, &wbc);
- spin_unlock(&inode_lock);
- return ret;
+ return sync_sb_inodes(sb, &wbc);
}
/*
--- linux-2.6.23-rc2-mm2.orig/include/linux/fs.h
+++ linux-2.6.23-rc2-mm2/include/linux/fs.h
@@ -1260,6 +1260,8 @@ struct super_operations {
void (*clear_inode) (struct inode *);
void (*umount_begin) (struct vfsmount *, int);
+ int (*sync_inodes) (struct super_block *sb,
+ struct writeback_control *wbc);
int (*show_options)(struct seq_file *, struct vfsmount *);
int (*show_stats)(struct seq_file *, struct vfsmount *);
#ifdef CONFIG_QUOTA
@@ -1654,6 +1656,7 @@ extern int invalidate_inode_pages2(struc
extern int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
extern int write_inode_now(struct inode *, int);
+extern int generic_sync_sb_inodes(struct super_block *, struct writeback_control *);
extern int filemap_fdatawrite(struct address_space *);
extern int filemap_flush(struct address_space *);
extern int filemap_fdatawait(struct address_space *);
[-- Attachment #4: check_dirty_inode_list.patch --]
[-- Type: text/x-diff, Size: 5731 bytes --]
From: Andrew Morton <akpm@linux-foundation.org>
The per-superblock dirty-inode list super_block.s_dirty is supposed to be
sorted in reverse order of each inode's time-of-first-dirtying. This is so
that the kupdate function can avoid having to walk all the dirty inodes on the
list: it terminates the search as soon as it finds an inode which was dirtied
less than 30 seconds ago (dirty_expire_centisecs).
We have a bunch of several-year-old bugs which cause that list to not be in
the correct reverse-time-order. The result of this is that under certain
obscure circumstances, inodes get stuck and basically never get written back.
It has been reported a couple of times, but nobody really cared much because
most people use ordered-mode journalling filesystems, which take care of the
writeback independently. Plus we will _eventually_ get onto these inodes even
when the list is out of order, and a /bin/sync will still work OK.
However this is a pretty important data-integrity issue for filesystems such
as ext2.
As preparation for fixing these bugs, this patch adds a pile of fantastically
expensive debugging code which checks the sanity of the s_dirty list all over
the place, so we find out as soon as it goes bad.
The debugging code is controlled by /proc/sys/fs/inode_debug, which defaults
to off. The debugging will disable itself whenever it detects a misordering,
to avoid log spew.
We can remove all this code later.
Cc: Mike Waychison <mikew@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 79 +++++++++++++++++++++++++++++++++++-
include/linux/writeback.h | 1
kernel/sysctl.c | 8 +++
3 files changed, 86 insertions(+), 2 deletions(-)
--- linux-2.6.23-rc2-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc2-mm2/fs/fs-writeback.c
@@ -24,6 +24,75 @@
#include <linux/buffer_head.h>
#include "internal.h"
+int sysctl_inode_debug __read_mostly;
+
+static int __check(struct list_head *head, int print_stuff)
+{
+ struct list_head *cursor = head;
+ unsigned long dirtied_when = 0;
+
+ while ((cursor = cursor->prev) != head) {
+ struct inode *inode = list_entry(cursor, struct inode, i_list);
+ if (print_stuff) {
+ printk("%p:%lu\n", inode, inode->dirtied_when);
+ } else {
+ if (dirtied_when &&
+ time_before(inode->dirtied_when, dirtied_when))
+ return 1;
+ dirtied_when = inode->dirtied_when;
+ }
+ }
+ return 0;
+}
+
+static void __check_dirty_inode_list(struct super_block *sb,
+ struct inode *inode, const char *file, int line)
+{
+ if (!sysctl_inode_debug)
+ return;
+
+ if (__check(&sb->s_dirty, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_dirty got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_dirty got screwed up\n", file, line);
+ __check(&sb->s_dirty, 1);
+ }
+ if (__check(&sb->s_io, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_io got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_io got screwed up\n", file, line);
+ __check(&sb->s_io, 1);
+ }
+ if (__check(&sb->s_more_io, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_more_io got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_more_io got screwed up\n", file, line);
+ __check(&sb->s_more_io, 1);
+ }
+}
+
+#define check_dirty_inode_list(sb) \
+ do { \
+ if (unlikely(sysctl_inode_debug)) \
+ __check_dirty_inode_list(sb, NULL, __FILE__, __LINE__); \
+ } while (0)
+
+#define check_dirty_inode(inode) \
+ do { \
+ if (unlikely(sysctl_inode_debug)) \
+ __check_dirty_inode_list(inode->i_sb, inode, \
+ __FILE__, __LINE__); \
+ } while (0)
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -122,8 +191,10 @@ void __mark_inode_dirty(struct inode *in
* reposition it (that would break s_dirty time-ordering).
*/
if (!was_dirty) {
+ check_dirty_inode(inode);
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
}
out:
@@ -152,6 +223,7 @@ static void redirty_tail(struct inode *i
{
struct super_block *sb = inode->i_sb;
+ check_dirty_inode(inode);
if (!list_empty(&sb->s_dirty)) {
struct inode *tail_inode;
@@ -161,6 +233,7 @@ static void redirty_tail(struct inode *i
inode->dirtied_when = jiffies;
}
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
/*
@@ -452,8 +525,10 @@ int generic_sync_sb_inodes(struct super_
if (!ret)
ret = err;
if (wbc->sync_mode == WB_SYNC_HOLD) {
+ check_dirty_inode(inode);
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
if (current_is_pdflush())
writeback_release(bdi);
--- linux-2.6.23-rc2-mm2.orig/include/linux/writeback.h
+++ linux-2.6.23-rc2-mm2/include/linux/writeback.h
@@ -140,5 +140,6 @@ void writeback_set_ratelimit(void);
extern int nr_pdflush_threads; /* Global so it can be exported to sysctl
read-only. */
+extern int sysctl_inode_debug;
#endif /* WRITEBACK_H */
--- linux-2.6.23-rc2-mm2.orig/kernel/sysctl.c
+++ linux-2.6.23-rc2-mm2/kernel/sysctl.c
@@ -1238,6 +1238,14 @@ static struct ctl_table fs_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "inode_debug",
+ .data = &sysctl_inode_debug,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
{
.ctl_name = CTL_UNNUMBERED,
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-08-11 6:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070811060202.690651223@mail.ustc.edu.cn>
2007-08-11 6:02 ` [PATCH 0/2] writeback dirty inodes fixes Fengguang Wu
2007-08-11 6:02 ` Fengguang Wu
[not found] ` <20070811060614.981610077@mail.ustc.edu.cn>
2007-08-11 6:02 ` [PATCH 1/2] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
2007-08-11 6:02 ` Fengguang Wu
[not found] ` <20070811060615.118166956@mail.ustc.edu.cn>
2007-08-11 6:02 ` [PATCH 2/2] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
2007-08-11 6:02 ` Fengguang Wu
[not found] ` <20070811061406.GA32051@mail.ustc.edu.cn>
2007-08-11 6:14 ` [PATCH 0/2] writeback dirty inodes fixes Fengguang Wu
2007-08-11 6:14 ` Fengguang Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).