* Re: Page Cache writeback too slow, SSD/noop scheduler/ext2
[not found] ` <200903250148.53644.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
@ 2009-03-25 5:26 ` Wu Fengguang
2009-03-27 16:59 ` Jos Houtman
0 siblings, 1 reply; 8+ messages in thread
From: Wu Fengguang @ 2009-03-25 5:26 UTC (permalink / raw)
To: Nick Piggin
Cc: Jos Houtman, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jeff Layton, Dave Chinner,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
[-- Attachment #1: Type: text/plain, Size: 4404 bytes --]
On Wed, Mar 25, 2009 at 01:48:53AM +1100, Nick Piggin wrote:
> On Monday 23 March 2009 03:53:29 Jos Houtman wrote:
> > On 3/21/09 11:53 AM, "Andrew Morton" <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> > > On Fri, 20 Mar 2009 19:26:06 +0100 Jos Houtman <jos-vMeIAzyucXQ@public.gmane.org> wrote:
> > >> Hi,
> > >>
> > >> We have hit a problem where the page-cache writeback algorithm is not
> > >> keeping up.
> > >> When memory gets low this will result in very irregular performance
> > >> drops.
> > >>
> > >> Our setup is as follows:
> > >> 30 x Quad core machine with 64GB ram.
> > >> These are single purpose machines running MySQL.
> > >> Kernel version: 2.6.28.7
> > >> A dedicated SSD drive for the ext2 database partition
> > >> Noop scheduler for the ssd drive.
> > >>
> > >>
> > >> The current hypothesis is as follows:
> > >> The wk_update function does not write enough dirty pages, which allows
> > >> the number of dirty pages to grow to the dirty_background limit.
> > >> When memory is low, __background_writeout() comes around and
> > >> __forcefully__ writes dirty pages to disk.
> > >> This forced write fills the disk queue and starves read calls that MySQL
> > >> is trying to do: basically killing performance for a few seconds. This
> > >> pattern repeats as soon as the cleared memory is filled again.
> > >>
> > >> Decreasing the dirty_writeback_centisecs to 100 doesn__t help
> > >>
> > >> I don__t know why this is, but I did some preliminary tracing using
> > >> systemtap and it seems that the majority of times wk_update calls
> > >> decides to do nothing.
> > >>
> > >> Doubling /sys/block/sdb/queue/nr_requests to 256, seems to help abit:
> > >> the nr_dirty pages is increasing more slowly.
> > >> But I am unsure of side-effects and am afraid of increasing the
> > >> starvation problem for mysql.
> > >>
> > >>
> > >> I__am very much willing to work on this issue and see it fixed, but
> > >> would like to tap into the knowledge of people here.
> > >> So:
> > >> * Have more people seen this or simular issues?
> > >> * Is the hypothesis above a viable one?
> > >> * Suggestions/pointers for further research and statistics I should
> > >> measure to improve the understanding of this problem.
> > >
> > > I don't think that noop-iosched tries to do anything to prevent
> > > writes-starve-reads. Do you get better behaviour from any of the other
> > > IO schedulers?
> >
> > I did a quick stress test and cfq does not immediately seem to hurt
> > performance, although some of my colleague's have tested this in the past
> > with the opposite results (which is why we use noop).
> >
> > But despite the scheduler, the real problem is in the writeback algorithm
> > not keeping up.
> > We can grow 600K dirty pages during the day, and only ~300k is flushed to
> > disk during the night hours.
> >
> > While a quick look at the writeback algorithm let me to expect
> > __wk_update()__ to flush ~1024 pages every 5 seconds, which is almost 3GB
> > per hour. It obviously does not manage to do this in our setup.
> >
> > I don¹t believe the speed of the ssd to be the problem, running sync
> > manually only takes a few minutes to flush 800K dirty pages to disk.
>
> kupdate surely should just continue to keep trying to write back pages
> so long as there are more old pages to clean, and the queue isn't
> congested. That seems to be the intention anyway: MAX_WRITEBACK_PAGES
> is just the number to write back in a single call, but you see
> nr_to_write is set to the number of dirty pages in the system.
>
> On your system, what must be happening is more_io is not being set.
> The logic in fs/fs-writeback.c might be busted.
Hi Jos,
I prepared a debugging patch for 2.6.28. (I cannot observe writeback
problems on my local ext2 mount.)
You can view the states of all dirty inodes by doing
modprobe filecache
echo ls dirty > /proc/filecache
cat /proc/filecache
The 'age' field shows (jiffies - inode->dirtied_when), which may also be useful
for debugging Jeff and Ian's case(if it keeps growing, then dirtied_when is stuck).
The detailed dirty writeback traces can be retrieved by doing
echo 1 > /proc/sys/fs/dirty_debug
sleep 6s
echo 0 > /proc/sys/fs/dirty_debug
dmesg
The dmesg trace should help identify the bug in periodic writeback.
Thanks,
Fengguang
[-- Attachment #2: filecache+writeback-debug-2.6.28.patch --]
[-- Type: text/x-diff, Size: 40831 bytes --]
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -27,6 +27,7 @@ extern unsigned long max_mapnr;
extern unsigned long num_physpages;
extern void * high_memory;
extern int page_cluster;
+extern char * const zone_names[];
#ifdef CONFIG_SYSCTL
extern int sysctl_legacy_va_layout;
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -104,7 +104,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
EXPORT_SYMBOL(totalram_pages);
-static char * const zone_names[MAX_NR_ZONES] = {
+char * const zone_names[MAX_NR_ZONES] = {
#ifdef CONFIG_ZONE_DMA
"DMA",
#endif
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -1943,7 +1943,10 @@ char *__d_path(const struct path *path,
if (dentry == root->dentry && vfsmnt == root->mnt)
break;
- if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
+ if (unlikely(!vfsmnt)) {
+ if (IS_ROOT(dentry))
+ break;
+ } else if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
/* Global root? */
if (vfsmnt->mnt_parent == vfsmnt) {
goto global_root;
--- linux-2.6.orig/lib/radix-tree.c
+++ linux-2.6/lib/radix-tree.c
@@ -564,7 +564,6 @@ out:
}
EXPORT_SYMBOL(radix_tree_tag_clear);
-#ifndef __KERNEL__ /* Only the test harness uses this at present */
/**
* radix_tree_tag_get - get a tag on a radix tree node
* @root: radix tree root
@@ -627,7 +626,6 @@ int radix_tree_tag_get(struct radix_tree
}
}
EXPORT_SYMBOL(radix_tree_tag_get);
-#endif
/**
* radix_tree_next_hole - find the next hole (not-present entry)
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -82,6 +82,10 @@ static struct hlist_head *inode_hashtabl
*/
DEFINE_SPINLOCK(inode_lock);
+EXPORT_SYMBOL(inode_in_use);
+EXPORT_SYMBOL(inode_unused);
+EXPORT_SYMBOL(inode_lock);
+
/*
* iprune_mutex provides exclusion between the kswapd or try_to_free_pages
* icache shrinking path, and the umount path. Without this exclusion,
@@ -108,6 +112,14 @@ static void wake_up_inode(struct inode *
wake_up_bit(&inode->i_state, __I_LOCK);
}
+static inline void inode_created_by(struct inode *inode, struct task_struct *task)
+{
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ inode->i_cuid = task->uid;
+ memcpy(inode->i_comm, task->comm, sizeof(task->comm));
+#endif
+}
+
static struct inode *alloc_inode(struct super_block *sb)
{
static const struct address_space_operations empty_aops;
@@ -142,7 +154,7 @@ static struct inode *alloc_inode(struct
inode->i_bdev = NULL;
inode->i_cdev = NULL;
inode->i_rdev = 0;
- inode->dirtied_when = 0;
+ inode->dirtied_when = jiffies;
if (security_inode_alloc(inode)) {
if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
@@ -183,6 +195,7 @@ static struct inode *alloc_inode(struct
}
inode->i_private = NULL;
inode->i_mapping = mapping;
+ inode_created_by(inode, current);
}
return inode;
}
@@ -247,6 +260,8 @@ void __iget(struct inode * inode)
inodes_stat.nr_unused--;
}
+EXPORT_SYMBOL(__iget);
+
/**
* clear_inode - clear an inode
* @inode: inode to clear
@@ -1353,6 +1368,16 @@ void inode_double_unlock(struct inode *i
}
EXPORT_SYMBOL(inode_double_unlock);
+
+struct hlist_head * get_inode_hash_budget(unsigned long index)
+{
+ if (index >= (1 << i_hash_shift))
+ return NULL;
+
+ return inode_hashtable + index;
+}
+EXPORT_SYMBOL_GPL(get_inode_hash_budget);
+
static __initdata unsigned long ihash_entries;
static int __init set_ihash_entries(char *str)
{
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -45,6 +45,9 @@
LIST_HEAD(super_blocks);
DEFINE_SPINLOCK(sb_lock);
+EXPORT_SYMBOL(super_blocks);
+EXPORT_SYMBOL(sb_lock);
+
/**
* alloc_super - create new superblock
* @type: filesystem type superblock should belong to
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -230,6 +230,7 @@ unsigned long shrink_slab(unsigned long
up_read(&shrinker_rwsem);
return ret;
}
+EXPORT_SYMBOL(shrink_slab);
/* Called without lock on whether page is mapped, so answer is unstable */
static inline int page_mapping_inuse(struct page *page)
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -44,6 +44,7 @@ struct address_space swapper_space = {
.i_mmap_nonlinear = LIST_HEAD_INIT(swapper_space.i_mmap_nonlinear),
.backing_dev_info = &swap_backing_dev_info,
};
+EXPORT_SYMBOL_GPL(swapper_space);
#define INC_CACHE_INFO(x) do { swap_cache_info.x++; } while (0)
--- linux-2.6.orig/Documentation/filesystems/proc.txt
+++ linux-2.6/Documentation/filesystems/proc.txt
@@ -266,6 +266,7 @@ Table 1-4: Kernel info in /proc
driver Various drivers grouped here, currently rtc (2.4)
execdomains Execdomains, related to security (2.4)
fb Frame Buffer devices (2.4)
+ filecache Query/drop in-memory file cache
fs File system parameters, currently nfs/exports (2.4)
ide Directory containing info about the IDE subsystem
interrupts Interrupt usage
@@ -456,6 +457,88 @@ varies by architecture and compile optio
> cat /proc/meminfo
+..............................................................................
+
+filecache:
+
+Provides access to the in-memory file cache.
+
+To list an index of all cached files:
+
+ echo ls > /proc/filecache
+ cat /proc/filecache
+
+The output looks like:
+
+ # filecache 1.0
+ # ino size cached cached% state refcnt dev file
+ 1026334 91 92 100 -- 66 03:02(hda2) /lib/ld-2.3.6.so
+ 233608 1242 972 78 -- 66 03:02(hda2) /lib/tls/libc-2.3.6.so
+ 65203 651 476 73 -- 1 03:02(hda2) /bin/bash
+ 1026445 261 160 61 -- 10 03:02(hda2) /lib/libncurses.so.5.5
+ 235427 10 12 100 -- 44 03:02(hda2) /lib/tls/libdl-2.3.6.so
+
+FIELD INTRO
+---------------------------------------------------------------------------
+ino inode number
+size inode size in KB
+cached cached size in KB
+cached% percent of file data cached
+state1 '-' clean; 'd' metadata dirty; 'D' data dirty
+state2 '-' unlocked; 'L' locked, normally indicates file being written out
+refcnt file reference count, it's an in-kernel one, not exactly open count
+dev major:minor numbers in hex, followed by a descriptive device name
+file file path _inside_ the filesystem. There are several special names:
+ '(noname)': the file name is not available
+ '(03:02)': the file is a block device file of major:minor
+ '...(deleted)': the named file has been deleted from the disk
+
+To list the cached pages of a perticular file:
+
+ echo /bin/bash > /proc/filecache
+ cat /proc/filecache
+
+ # file /bin/bash
+ # flags R:referenced A:active U:uptodate D:dirty W:writeback M:mmap
+ # idx len state refcnt
+ 0 36 RAU__M 3
+ 36 1 RAU__M 2
+ 37 8 RAU__M 3
+ 45 2 RAU___ 1
+ 47 6 RAU__M 3
+ 53 3 RAU__M 2
+ 56 2 RAU__M 3
+
+FIELD INTRO
+----------------------------------------------------------------------------
+idx page index
+len number of pages which are cached and share the same state
+state page state of the flags listed in line two
+refcnt page reference count
+
+Careful users may notice that the file name to be queried is remembered between
+commands. Internally, the module has a global variable to store the file name
+parameter, so that it can be inherited by newly opened /proc/filecache file.
+However it can lead to interference for multiple queriers. The solution here
+is to obey a rule: only root can interactively change the file name parameter;
+normal users must go for scripts to access the interface. Scripts should do it
+by following the code example below:
+
+ filecache = open("/proc/filecache", "rw");
+ # avoid polluting the global parameter filename
+ filecache.write("set private");
+
+To instruct the kernel to drop clean caches, dentries and inodes from memory,
+causing that memory to become free:
+
+ # drop clean file data cache (i.e. file backed pagecache)
+ echo drop pagecache > /proc/filecache
+
+ # drop clean file metadata cache (i.e. dentries and inodes)
+ echo drop slabcache > /proc/filecache
+
+Note that the drop commands are non-destructive operations and dirty objects
+are not freeable, the user should run `sync' first.
MemTotal: 16344972 kB
MemFree: 13634064 kB
--- /dev/null
+++ linux-2.6/fs/proc/filecache.c
@@ -0,0 +1,1046 @@
+/*
+ * fs/proc/filecache.c
+ *
+ * Copyright (C) 2006, 2007 Fengguang Wu <wfg-fOMaevN1BEbsJZF79Ady7g@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/radix-tree.h>
+#include <linux/page-flags.h>
+#include <linux/pagevec.h>
+#include <linux/pagemap.h>
+#include <linux/vmalloc.h>
+#include <linux/writeback.h>
+#include <linux/buffer_head.h>
+#include <linux/parser.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/module.h>
+#include <asm/uaccess.h>
+
+/*
+ * Increase minor version when new columns are added;
+ * Increase major version when existing columns are changed.
+ */
+#define FILECACHE_VERSION "1.0"
+
+/* Internal buffer sizes. The larger the more effcient. */
+#define SBUF_SIZE (128<<10)
+#define IWIN_PAGE_ORDER 3
+#define IWIN_SIZE ((PAGE_SIZE<<IWIN_PAGE_ORDER) / sizeof(struct inode *))
+
+/*
+ * Session management.
+ *
+ * Each opened /proc/filecache file is assiocated with a session object.
+ * Also there is a global_session that maintains status across open()/close()
+ * (i.e. the lifetime of an opened file), so that a casual user can query the
+ * filecache via _multiple_ simple shell commands like
+ * 'echo cat /bin/bash > /proc/filecache; cat /proc/filecache'.
+ *
+ * session.query_file is the file whose cache info is to be queried.
+ * Its value determines what we get on read():
+ * - NULL: ii_*() called to show the inode index
+ * - filp: pg_*() called to show the page groups of a filp
+ *
+ * session.query_file is
+ * - cloned from global_session.query_file on open();
+ * - updated on write("cat filename");
+ * note that the new file will also be saved in global_session.query_file if
+ * session.private_session is false.
+ */
+
+struct session {
+ /* options */
+ int private_session;
+ unsigned long ls_options;
+ dev_t ls_dev;
+
+ /* parameters */
+ struct file *query_file;
+
+ /* seqfile pos */
+ pgoff_t start_offset;
+ pgoff_t next_offset;
+
+ /* inode at last pos */
+ struct {
+ unsigned long pos;
+ unsigned long state;
+ struct inode *inode;
+ struct inode *pinned_inode;
+ } ipos;
+
+ /* inode window */
+ struct {
+ unsigned long cursor;
+ unsigned long origin;
+ unsigned long size;
+ struct inode **inodes;
+ } iwin;
+};
+
+static struct session global_session;
+
+/*
+ * Session address is stored in proc_file->f_ra.start:
+ * we assume that there will be no readahead for proc_file.
+ */
+static struct session *get_session(struct file *proc_file)
+{
+ return (struct session *)proc_file->f_ra.start;
+}
+
+static void set_session(struct file *proc_file, struct session *s)
+{
+ BUG_ON(proc_file->f_ra.start);
+ proc_file->f_ra.start = (unsigned long)s;
+}
+
+static void update_global_file(struct session *s)
+{
+ if (s->private_session)
+ return;
+
+ if (global_session.query_file)
+ fput(global_session.query_file);
+
+ global_session.query_file = s->query_file;
+
+ if (global_session.query_file)
+ get_file(global_session.query_file);
+}
+
+/*
+ * Cases of the name:
+ * 1) NULL (new session)
+ * s->query_file = global_session.query_file = 0;
+ * 2) "" (ls/la)
+ * s->query_file = global_session.query_file;
+ * 3) a regular file name (cat newfile)
+ * s->query_file = global_session.query_file = newfile;
+ */
+static int session_update_file(struct session *s, char *name)
+{
+ static DEFINE_MUTEX(mutex); /* protects global_session.query_file */
+ int err = 0;
+
+ mutex_lock(&mutex);
+
+ /*
+ * We are to quit, or to list the cached files.
+ * Reset *.query_file.
+ */
+ if (!name) {
+ if (s->query_file) {
+ fput(s->query_file);
+ s->query_file = NULL;
+ }
+ update_global_file(s);
+ goto out;
+ }
+
+ /*
+ * This is a new session.
+ * Inherit options/parameters from global ones.
+ */
+ if (name[0] == '\0') {
+ *s = global_session;
+ if (s->query_file)
+ get_file(s->query_file);
+ goto out;
+ }
+
+ /*
+ * Open the named file.
+ */
+ if (s->query_file)
+ fput(s->query_file);
+ s->query_file = filp_open(name, O_RDONLY|O_LARGEFILE, 0);
+ if (IS_ERR(s->query_file)) {
+ err = PTR_ERR(s->query_file);
+ s->query_file = NULL;
+ } else
+ update_global_file(s);
+
+out:
+ mutex_unlock(&mutex);
+
+ return err;
+}
+
+static struct session *session_create(void)
+{
+ struct session *s;
+ int err = 0;
+
+ s = kmalloc(sizeof(*s), GFP_KERNEL);
+ if (s)
+ err = session_update_file(s, "");
+ else
+ err = -ENOMEM;
+
+ return err ? ERR_PTR(err) : s;
+}
+
+static void session_release(struct session *s)
+{
+ if (s->ipos.pinned_inode)
+ iput(s->ipos.pinned_inode);
+ if (s->query_file)
+ fput(s->query_file);
+ kfree(s);
+}
+
+
+/*
+ * Listing of cached files.
+ *
+ * Usage:
+ * echo > /proc/filecache # enter listing mode
+ * cat /proc/filecache # get the file listing
+ */
+
+/* code style borrowed from ib_srp.c */
+enum {
+ LS_OPT_ERR = 0,
+ LS_OPT_DIRTY = 1 << 0,
+ LS_OPT_CLEAN = 1 << 1,
+ LS_OPT_INUSE = 1 << 2,
+ LS_OPT_EMPTY = 1 << 3,
+ LS_OPT_ALL = 1 << 4,
+ LS_OPT_DEV = 1 << 5,
+};
+
+static match_table_t ls_opt_tokens = {
+ { LS_OPT_DIRTY, "dirty" },
+ { LS_OPT_CLEAN, "clean" },
+ { LS_OPT_INUSE, "inuse" },
+ { LS_OPT_EMPTY, "empty" },
+ { LS_OPT_ALL, "all" },
+ { LS_OPT_DEV, "dev=%s" },
+ { LS_OPT_ERR, NULL }
+};
+
+static int ls_parse_options(const char *buf, struct session *s)
+{
+ substring_t args[MAX_OPT_ARGS];
+ char *options, *sep_opt;
+ char *p;
+ int token;
+ int ret = 0;
+
+ if (!buf)
+ return 0;
+ options = kstrdup(buf, GFP_KERNEL);
+ if (!options)
+ return -ENOMEM;
+
+ s->ls_options = 0;
+ sep_opt = options;
+ while ((p = strsep(&sep_opt, " ")) != NULL) {
+ if (!*p)
+ continue;
+
+ token = match_token(p, ls_opt_tokens, args);
+
+ switch (token) {
+ case LS_OPT_DIRTY:
+ case LS_OPT_CLEAN:
+ case LS_OPT_INUSE:
+ case LS_OPT_EMPTY:
+ case LS_OPT_ALL:
+ s->ls_options |= token;
+ break;
+ case LS_OPT_DEV:
+ p = match_strdup(args);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ if (*p == '/') {
+ struct kstat stat;
+ struct nameidata nd;
+ ret = path_lookup(p, LOOKUP_FOLLOW, &nd);
+ if (!ret)
+ ret = vfs_getattr(nd.path.mnt,
+ nd.path.dentry, &stat);
+ if (!ret)
+ s->ls_dev = stat.rdev;
+ } else
+ s->ls_dev = simple_strtoul(p, NULL, 0);
+ /* printk("%lx %s\n", (long)s->ls_dev, p); */
+ kfree(p);
+ break;
+
+ default:
+ printk(KERN_WARNING "unknown parameter or missing value "
+ "'%s' in ls command\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+out:
+ kfree(options);
+ return ret;
+}
+
+/*
+ * Add possible filters here.
+ * No permission check: we cannot verify the path's permission anyway.
+ * We simply demand root previledge for accessing /proc/filecache.
+ */
+static int may_show_inode(struct session *s, struct inode *inode)
+{
+ if (!atomic_read(&inode->i_count))
+ return 0;
+ if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+ return 0;
+ if (!inode->i_mapping)
+ return 0;
+
+ if (s->ls_dev && s->ls_dev != inode->i_sb->s_dev)
+ return 0;
+
+ if (s->ls_options & LS_OPT_ALL)
+ return 1;
+
+ if (!(s->ls_options & LS_OPT_EMPTY) && !inode->i_mapping->nrpages)
+ return 0;
+
+ if ((s->ls_options & LS_OPT_DIRTY) && !(inode->i_state & I_DIRTY))
+ return 0;
+
+ if ((s->ls_options & LS_OPT_CLEAN) && (inode->i_state & I_DIRTY))
+ return 0;
+
+ if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+ S_ISLNK(inode->i_mode) || S_ISBLK(inode->i_mode)))
+ return 0;
+
+ return 1;
+}
+
+/*
+ * Full: there are more data following.
+ */
+static int iwin_full(struct session *s)
+{
+ return !s->iwin.cursor ||
+ s->iwin.cursor > s->iwin.origin + s->iwin.size;
+}
+
+static int iwin_push(struct session *s, struct inode *inode)
+{
+ if (!may_show_inode(s, inode))
+ return 0;
+
+ s->iwin.cursor++;
+
+ if (s->iwin.size >= IWIN_SIZE)
+ return 1;
+
+ if (s->iwin.cursor > s->iwin.origin)
+ s->iwin.inodes[s->iwin.size++] = inode;
+ return 0;
+}
+
+/*
+ * Travease the inode lists in order - newest first.
+ * And fill @s->iwin.inodes with inodes positioned in [@pos, @pos+IWIN_SIZE).
+ */
+static int iwin_fill(struct session *s, unsigned long pos)
+{
+ struct inode *inode;
+ struct super_block *sb;
+
+ s->iwin.origin = pos;
+ s->iwin.cursor = 0;
+ s->iwin.size = 0;
+
+ /*
+ * We have a cursor inode, clean and expected to be unchanged.
+ */
+ if (s->ipos.inode && pos >= s->ipos.pos &&
+ !(s->ipos.state & I_DIRTY) &&
+ s->ipos.state == s->ipos.inode->i_state) {
+ inode = s->ipos.inode;
+ s->iwin.cursor = s->ipos.pos;
+ goto continue_from_saved;
+ }
+
+ if (s->ls_options & LS_OPT_CLEAN)
+ goto clean_inodes;
+
+ spin_lock(&sb_lock);
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if (s->ls_dev && s->ls_dev != sb->s_dev)
+ continue;
+
+ list_for_each_entry(inode, &sb->s_dirty, i_list) {
+ if (iwin_push(s, inode))
+ goto out_full_unlock;
+ }
+ list_for_each_entry(inode, &sb->s_io, i_list) {
+ if (iwin_push(s, inode))
+ goto out_full_unlock;
+ }
+ }
+ spin_unlock(&sb_lock);
+
+clean_inodes:
+ list_for_each_entry(inode, &inode_in_use, i_list) {
+ if (iwin_push(s, inode))
+ goto out_full;
+continue_from_saved:
+ ;
+ }
+
+ if (s->ls_options & LS_OPT_INUSE)
+ return 0;
+
+ list_for_each_entry(inode, &inode_unused, i_list) {
+ if (iwin_push(s, inode))
+ goto out_full;
+ }
+
+ return 0;
+
+out_full_unlock:
+ spin_unlock(&sb_lock);
+out_full:
+ return 1;
+}
+
+static struct inode *iwin_inode(struct session *s, unsigned long pos)
+{
+ if ((iwin_full(s) && pos >= s->iwin.origin + s->iwin.size)
+ || pos < s->iwin.origin)
+ iwin_fill(s, pos);
+
+ if (pos >= s->iwin.cursor)
+ return NULL;
+
+ s->ipos.pos = pos;
+ s->ipos.inode = s->iwin.inodes[pos - s->iwin.origin];
+ BUG_ON(!s->ipos.inode);
+ return s->ipos.inode;
+}
+
+static void show_inode(struct seq_file *m, struct inode *inode)
+{
+ char state[] = "--"; /* dirty, locked */
+ struct dentry *dentry;
+ loff_t size = i_size_read(inode);
+ unsigned long nrpages;
+ int percent;
+ int refcnt;
+ int shift;
+
+ if (!size)
+ size++;
+
+ if (inode->i_mapping)
+ nrpages = inode->i_mapping->nrpages;
+ else {
+ nrpages = 0;
+ WARN_ON(1);
+ }
+
+ for (shift = 0; (size >> shift) > ULONG_MAX / 128; shift += 12)
+ ;
+ percent = min(100UL, (((100 * nrpages) >> shift) << PAGE_CACHE_SHIFT) /
+ (unsigned long)(size >> shift));
+
+ if (inode->i_state & (I_DIRTY_DATASYNC|I_DIRTY_PAGES))
+ state[0] = 'D';
+ else if (inode->i_state & I_DIRTY_SYNC)
+ state[0] = 'd';
+
+ if (inode->i_state & I_LOCK)
+ state[0] = 'L';
+
+ refcnt = 0;
+ list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
+ refcnt += atomic_read(&dentry->d_count);
+ }
+
+ seq_printf(m, "%10lu %10llu %8lu %7d ",
+ inode->i_ino,
+ DIV_ROUND_UP(size, 1024),
+ nrpages << (PAGE_CACHE_SHIFT - 10),
+ percent);
+
+ seq_printf(m, "%6d %5s %9lu ",
+ refcnt,
+ state,
+ (jiffies - inode->dirtied_when) / HZ);
+
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ seq_printf(m, "%8u %5u %-16s",
+ inode->i_access_count,
+ inode->i_cuid,
+ inode->i_comm);
+#endif
+
+ seq_printf(m, "%02x:%02x(%s)\t",
+ MAJOR(inode->i_sb->s_dev),
+ MINOR(inode->i_sb->s_dev),
+ inode->i_sb->s_id);
+
+ if (list_empty(&inode->i_dentry)) {
+ if (!atomic_read(&inode->i_count))
+ seq_puts(m, "(noname)\n");
+ else
+ seq_printf(m, "(%02x:%02x)\n",
+ imajor(inode), iminor(inode));
+ } else {
+ struct path path = {
+ .mnt = NULL,
+ .dentry = list_entry(inode->i_dentry.next,
+ struct dentry, d_alias)
+ };
+
+ seq_path(m, &path, " \t\n\\");
+ seq_putc(m, '\n');
+ }
+}
+
+static int ii_show(struct seq_file *m, void *v)
+{
+ unsigned long index = *(loff_t *) v;
+ struct session *s = m->private;
+ struct inode *inode;
+
+ if (index == 0) {
+ seq_puts(m, "# filecache " FILECACHE_VERSION "\n");
+ seq_puts(m, "# ino size cached cached% "
+ "refcnt state age "
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ "accessed uid process "
+#endif
+ "dev\t\tfile\n");
+ }
+
+ inode = iwin_inode(s,index);
+ show_inode(m, inode);
+
+ return 0;
+}
+
+static void *ii_start(struct seq_file *m, loff_t *pos)
+{
+ struct session *s = m->private;
+
+ s->iwin.size = 0;
+ s->iwin.inodes = (struct inode **)
+ __get_free_pages( GFP_KERNEL, IWIN_PAGE_ORDER);
+ if (!s->iwin.inodes)
+ return NULL;
+
+ spin_lock(&inode_lock);
+
+ return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void *ii_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct session *s = m->private;
+
+ (*pos)++;
+ return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void ii_stop(struct seq_file *m, void *v)
+{
+ struct session *s = m->private;
+ struct inode *inode = s->ipos.inode;
+
+ if (!s->iwin.inodes)
+ return;
+
+ if (inode) {
+ __iget(inode);
+ s->ipos.state = inode->i_state;
+ }
+ spin_unlock(&inode_lock);
+
+ free_pages((unsigned long) s->iwin.inodes, IWIN_PAGE_ORDER);
+ if (s->ipos.pinned_inode)
+ iput(s->ipos.pinned_inode);
+ s->ipos.pinned_inode = inode;
+}
+
+/*
+ * Listing of cached page ranges of a file.
+ *
+ * Usage:
+ * echo 'file name' > /proc/filecache
+ * cat /proc/filecache
+ */
+
+unsigned long page_mask;
+#define PG_MMAP PG_lru /* reuse any non-relevant flag */
+#define PG_BUFFER PG_swapcache /* ditto */
+#define PG_DIRTY PG_error /* ditto */
+#define PG_WRITEBACK PG_buddy /* ditto */
+
+/*
+ * Page state names, prefixed by their abbreviations.
+ */
+struct {
+ unsigned long mask;
+ const char *name;
+ int faked;
+} page_flag [] = {
+ {1 << PG_referenced, "R:referenced", 0},
+ {1 << PG_active, "A:active", 0},
+ {1 << PG_MMAP, "M:mmap", 1},
+
+ {1 << PG_uptodate, "U:uptodate", 0},
+ {1 << PG_dirty, "D:dirty", 0},
+ {1 << PG_writeback, "W:writeback", 0},
+ {1 << PG_reclaim, "X:readahead", 0},
+
+ {1 << PG_private, "P:private", 0},
+ {1 << PG_owner_priv_1, "O:owner", 0},
+
+ {1 << PG_BUFFER, "b:buffer", 1},
+ {1 << PG_DIRTY, "d:dirty", 1},
+ {1 << PG_WRITEBACK, "w:writeback", 1},
+};
+
+static unsigned long page_flags(struct page* page)
+{
+ unsigned long flags;
+ struct address_space *mapping = page_mapping(page);
+
+ flags = page->flags & page_mask;
+
+ if (page_mapped(page))
+ flags |= (1 << PG_MMAP);
+
+ if (page_has_buffers(page))
+ flags |= (1 << PG_BUFFER);
+
+ if (mapping) {
+ if (radix_tree_tag_get(&mapping->page_tree,
+ page_index(page),
+ PAGECACHE_TAG_WRITEBACK))
+ flags |= (1 << PG_WRITEBACK);
+
+ if (radix_tree_tag_get(&mapping->page_tree,
+ page_index(page),
+ PAGECACHE_TAG_DIRTY))
+ flags |= (1 << PG_DIRTY);
+ }
+
+ return flags;
+}
+
+static int pages_similiar(struct page* page0, struct page* page)
+{
+ if (page_count(page0) != page_count(page))
+ return 0;
+
+ if (page_flags(page0) != page_flags(page))
+ return 0;
+
+ return 1;
+}
+
+static void show_range(struct seq_file *m, struct page* page, unsigned long len)
+{
+ int i;
+ unsigned long flags;
+
+ if (!m || !page)
+ return;
+
+ seq_printf(m, "%lu\t%lu\t", page->index, len);
+
+ flags = page_flags(page);
+ for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+ seq_putc(m, (flags & page_flag[i].mask) ?
+ page_flag[i].name[0] : '_');
+
+ seq_printf(m, "\t%d\n", page_count(page));
+}
+
+#define BATCH_LINES 100
+static pgoff_t show_file_cache(struct seq_file *m,
+ struct address_space *mapping, pgoff_t start)
+{
+ int i;
+ int lines = 0;
+ pgoff_t len = 0;
+ struct pagevec pvec;
+ struct page *page;
+ struct page *page0 = NULL;
+
+ for (;;) {
+ pagevec_init(&pvec, 0);
+ pvec.nr = radix_tree_gang_lookup(&mapping->page_tree,
+ (void **)pvec.pages, start + len, PAGEVEC_SIZE);
+
+ if (pvec.nr == 0) {
+ show_range(m, page0, len);
+ start = ULONG_MAX;
+ goto out;
+ }
+
+ if (!page0)
+ page0 = pvec.pages[0];
+
+ for (i = 0; i < pvec.nr; i++) {
+ page = pvec.pages[i];
+
+ if (page->index == start + len &&
+ pages_similiar(page0, page))
+ len++;
+ else {
+ show_range(m, page0, len);
+ page0 = page;
+ start = page->index;
+ len = 1;
+ if (++lines > BATCH_LINES)
+ goto out;
+ }
+ }
+ }
+
+out:
+ return start;
+}
+
+static int pg_show(struct seq_file *m, void *v)
+{
+ struct session *s = m->private;
+ struct file *file = s->query_file;
+ pgoff_t offset;
+
+ if (!file)
+ return ii_show(m, v);
+
+ offset = *(loff_t *) v;
+
+ if (!offset) { /* print header */
+ int i;
+
+ seq_puts(m, "# file ");
+ seq_path(m, &file->f_path, " \t\n\\");
+
+ seq_puts(m, "\n# flags");
+ for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+ seq_printf(m, " %s", page_flag[i].name);
+
+ seq_puts(m, "\n# idx\tlen\tstate\t\trefcnt\n");
+ }
+
+ s->start_offset = offset;
+ s->next_offset = show_file_cache(m, file->f_mapping, offset);
+
+ return 0;
+}
+
+static void *file_pos(struct file *file, loff_t *pos)
+{
+ loff_t size = i_size_read(file->f_mapping->host);
+ pgoff_t end = DIV_ROUND_UP(size, PAGE_CACHE_SIZE);
+ pgoff_t offset = *pos;
+
+ return offset < end ? pos : NULL;
+}
+
+static void *pg_start(struct seq_file *m, loff_t *pos)
+{
+ struct session *s = m->private;
+ struct file *file = s->query_file;
+ pgoff_t offset = *pos;
+
+ if (!file)
+ return ii_start(m, pos);
+
+ rcu_read_lock();
+
+ if (offset - s->start_offset == 1)
+ *pos = s->next_offset;
+ return file_pos(file, pos);
+}
+
+static void *pg_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct session *s = m->private;
+ struct file *file = s->query_file;
+
+ if (!file)
+ return ii_next(m, v, pos);
+
+ *pos = s->next_offset;
+ return file_pos(file, pos);
+}
+
+static void pg_stop(struct seq_file *m, void *v)
+{
+ struct session *s = m->private;
+ struct file *file = s->query_file;
+
+ if (!file)
+ return ii_stop(m, v);
+
+ rcu_read_unlock();
+}
+
+struct seq_operations seq_filecache_op = {
+ .start = pg_start,
+ .next = pg_next,
+ .stop = pg_stop,
+ .show = pg_show,
+};
+
+/*
+ * Implement the manual drop-all-pagecache function
+ */
+
+#define MAX_INODES (PAGE_SIZE / sizeof(struct inode *))
+static int drop_pagecache(void)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct inode *inode;
+ struct inode **inodes;
+ unsigned long i, j, k;
+ int err = 0;
+
+ inodes = (struct inode **)__get_free_pages(GFP_KERNEL, IWIN_PAGE_ORDER);
+ if (!inodes)
+ return -ENOMEM;
+
+ for (i = 0; (head = get_inode_hash_budget(i)); i++) {
+ if (hlist_empty(head))
+ continue;
+
+ j = 0;
+ cond_resched();
+
+ /*
+ * Grab some inodes.
+ */
+ spin_lock(&inode_lock);
+ hlist_for_each (node, head) {
+ inode = hlist_entry(node, struct inode, i_hash);
+ if (!atomic_read(&inode->i_count))
+ continue;
+ if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+ continue;
+ if (!inode->i_mapping || !inode->i_mapping->nrpages)
+ continue;
+ __iget(inode);
+ inodes[j++] = inode;
+ if (j >= MAX_INODES)
+ break;
+ }
+ spin_unlock(&inode_lock);
+
+ /*
+ * Free clean pages.
+ */
+ for (k = 0; k < j; k++) {
+ inode = inodes[k];
+ invalidate_mapping_pages(inode->i_mapping, 0, ~1);
+ iput(inode);
+ }
+
+ /*
+ * Simply ignore the remaining inodes.
+ */
+ if (j >= MAX_INODES && !err) {
+ printk(KERN_WARNING
+ "Too many collides in inode hash table.\n"
+ "Pls boot with a larger ihash_entries=XXX.\n");
+ err = -EAGAIN;
+ }
+ }
+
+ free_pages((unsigned long) inodes, IWIN_PAGE_ORDER);
+ return err;
+}
+
+static void drop_slabcache(void)
+{
+ int nr_objects;
+
+ do {
+ nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+ } while (nr_objects > 10);
+}
+
+/*
+ * Proc file operations.
+ */
+
+static int filecache_open(struct inode *inode, struct file *proc_file)
+{
+ struct seq_file *m;
+ struct session *s;
+ unsigned size;
+ char *buf = 0;
+ int ret;
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENOENT;
+
+ s = session_create();
+ if (IS_ERR(s)) {
+ ret = PTR_ERR(s);
+ goto out;
+ }
+ set_session(proc_file, s);
+
+ size = SBUF_SIZE;
+ buf = kmalloc(size, GFP_KERNEL);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = seq_open(proc_file, &seq_filecache_op);
+ if (!ret) {
+ m = proc_file->private_data;
+ m->private = s;
+ m->buf = buf;
+ m->size = size;
+ }
+
+out:
+ if (ret) {
+ kfree(s);
+ kfree(buf);
+ module_put(THIS_MODULE);
+ }
+ return ret;
+}
+
+static int filecache_release(struct inode *inode, struct file *proc_file)
+{
+ struct session *s = get_session(proc_file);
+ int ret;
+
+ session_release(s);
+ ret = seq_release(inode, proc_file);
+ module_put(THIS_MODULE);
+ return ret;
+}
+
+ssize_t filecache_write(struct file *proc_file, const char __user * buffer,
+ size_t count, loff_t *ppos)
+{
+ struct session *s;
+ char *name;
+ int err = 0;
+
+ if (count >= PATH_MAX + 5)
+ return -ENAMETOOLONG;
+
+ name = kmalloc(count+1, GFP_KERNEL);
+ if (!name)
+ return -ENOMEM;
+
+ if (copy_from_user(name, buffer, count)) {
+ err = -EFAULT;
+ goto out;
+ }
+
+ /* strip the optional newline */
+ if (count && name[count-1] == '\n')
+ name[count-1] = '\0';
+ else
+ name[count] = '\0';
+
+ s = get_session(proc_file);
+ if (!strcmp(name, "set private")) {
+ s->private_session = 1;
+ goto out;
+ }
+
+ if (!strncmp(name, "cat ", 4)) {
+ err = session_update_file(s, name+4);
+ goto out;
+ }
+
+ if (!strncmp(name, "ls", 2)) {
+ err = session_update_file(s, NULL);
+ if (!err)
+ err = ls_parse_options(name+2, s);
+ if (!err && !s->private_session) {
+ global_session.ls_dev = s->ls_dev;
+ global_session.ls_options = s->ls_options;
+ }
+ goto out;
+ }
+
+ if (!strncmp(name, "drop pagecache", 14)) {
+ err = drop_pagecache();
+ goto out;
+ }
+
+ if (!strncmp(name, "drop slabcache", 14)) {
+ drop_slabcache();
+ goto out;
+ }
+
+ /* err = -EINVAL; */
+ err = session_update_file(s, name);
+
+out:
+ kfree(name);
+
+ return err ? err : count;
+}
+
+static struct file_operations proc_filecache_fops = {
+ .owner = THIS_MODULE,
+ .open = filecache_open,
+ .release = filecache_release,
+ .write = filecache_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+};
+
+
+static __init int filecache_init(void)
+{
+ int i;
+ struct proc_dir_entry *entry;
+
+ entry = create_proc_entry("filecache", 0600, NULL);
+ if (entry)
+ entry->proc_fops = &proc_filecache_fops;
+
+ for (page_mask = i = 0; i < ARRAY_SIZE(page_flag); i++)
+ if (!page_flag[i].faked)
+ page_mask |= page_flag[i].mask;
+
+ return 0;
+}
+
+static void filecache_exit(void)
+{
+ remove_proc_entry("filecache", NULL);
+ if (global_session.query_file)
+ fput(global_session.query_file);
+}
+
+MODULE_AUTHOR("Fengguang Wu <wfg-fOMaevN1BEbsJZF79Ady7g@public.gmane.org>");
+MODULE_LICENSE("GPL");
+
+module_init(filecache_init);
+module_exit(filecache_exit);
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -685,6 +685,12 @@ struct inode {
void *i_security;
#endif
void *i_private; /* fs or device private pointer */
+
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ unsigned int i_access_count; /* opened how many times? */
+ uid_t i_cuid; /* opened first by which user? */
+ char i_comm[16]; /* opened first by which app? */
+#endif
};
/*
@@ -773,6 +779,13 @@ static inline unsigned imajor(const stru
return MAJOR(inode->i_rdev);
}
+static inline void inode_accessed(struct inode *inode)
+{
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ inode->i_access_count++;
+#endif
+}
+
extern struct block_device *I_BDEV(struct inode *inode);
struct fown_struct {
@@ -1907,6 +1920,7 @@ extern void remove_inode_hash(struct ino
static inline void insert_inode_hash(struct inode *inode) {
__insert_inode_hash(inode, inode->i_ino);
}
+struct hlist_head * get_inode_hash_budget(unsigned long index);
extern struct file * get_empty_filp(void);
extern void file_move(struct file *f, struct list_head *list);
--- linux-2.6.orig/fs/open.c
+++ linux-2.6/fs/open.c
@@ -828,6 +828,7 @@ static struct file *__dentry_open(struct
goto cleanup_all;
}
+ inode_accessed(inode);
f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
--- linux-2.6.orig/fs/Kconfig
+++ linux-2.6/fs/Kconfig
@@ -750,6 +750,36 @@ config CONFIGFS_FS
Both sysfs and configfs can and should exist together on the
same system. One is not a replacement for the other.
+config PROC_FILECACHE
+ tristate "/proc/filecache support"
+ default m
+ depends on PROC_FS
+ help
+ This option creates a file /proc/filecache which enables one to
+ query/drop the cached files in memory.
+
+ A quick start guide:
+
+ # echo 'ls' > /proc/filecache
+ # head /proc/filecache
+
+ # echo 'cat /bin/bash' > /proc/filecache
+ # head /proc/filecache
+
+ # echo 'drop pagecache' > /proc/filecache
+ # echo 'drop slabcache' > /proc/filecache
+
+ For more details, please check Documentation/filesystems/proc.txt .
+
+ It can be a handy tool for sysadms and desktop users.
+
+config PROC_FILECACHE_EXTRAS
+ bool "track extra states"
+ default y
+ depends on PROC_FILECACHE
+ help
+ Track extra states that costs a little more time/space.
+
endmenu
menu "Miscellaneous filesystems"
--- linux-2.6.orig/fs/proc/Makefile
+++ linux-2.6/fs/proc/Makefile
@@ -2,7 +2,8 @@
# Makefile for the Linux proc filesystem routines.
#
-obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PROC_FILECACHE) += filecache.o
proc-y := nommu.o task_nommu.o
proc-$(CONFIG_MMU) := mmu.o task_mmu.o
--- linux-2.6.orig/fs/fs-writeback.c
+++ linux-2.6/fs/fs-writeback.c
@@ -23,9 +23,66 @@
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
#include <linux/buffer_head.h>
+#include <linux/pagevec.h>
#include "internal.h"
+int sysctl_dirty_debug __read_mostly;
+
+void print_page(struct page *page)
+{
+ printk(KERN_DEBUG "%lu\t%u\t%u\t%c%c%c%c%c\n",
+ page->index,
+ page_count(page),
+ page_mapcount(page),
+ PageUptodate(page) ? 'U' : '_',
+ PageDirty(page) ? 'D' : '_',
+ PageWriteback(page) ? 'W' : '_',
+ PagePrivate(page) ? 'P' : '_',
+ PageLocked(page) ? 'L' : '_');
+}
+
+void print_inode_pages(struct inode *inode)
+{
+ struct address_space *mapping = inode->i_mapping;
+ struct pagevec pvec;
+ int nr_pages;
+ int i;
+ struct dentry *dentry;
+ int dcount;
+ char *dname;
+
+ rcu_read_lock();
+ nr_pages = radix_tree_gang_lookup_tag(&mapping->page_tree,
+ (void **)pvec.pages, 0, PAGEVEC_SIZE,
+ PAGECACHE_TAG_DIRTY);
+ rcu_read_unlock();
+
+ if (list_empty(&inode->i_dentry)) {
+ dname = "";
+ dcount = 0;
+ } else {
+ dentry = list_entry(inode->i_dentry.next,
+ struct dentry, d_alias);
+ dname = dentry->d_iname;
+ dcount = atomic_read(&dentry->d_count);
+ }
+
+ printk(KERN_DEBUG "inode %lu(%s/%s) count %d,%d size %llu pages %lu\n",
+ inode->i_ino,
+ inode->i_sb->s_id,
+ dname,
+ atomic_read(&inode->i_count),
+ dcount,
+ i_size_read(inode),
+ mapping->nrpages
+ );
+
+ for (i = 0; i < nr_pages; i++)
+ print_page(pvec.pages[i]);
+}
+
+
/**
* writeback_acquire - attempt to get exclusive writeback access to a device
* @bdi: the device's backing_dev_info structure
@@ -179,6 +236,11 @@ static int write_inode(struct inode *ino
return 0;
}
+#define redirty_tail(inode) \
+ do { \
+ __redirty_tail(inode, __LINE__); \
+ } while (0)
+
/*
* Redirty an inode: set its when-it-was dirtied timestamp and move it to the
* furthest end of its superblock's dirty-inode list.
@@ -188,10 +250,16 @@ static int write_inode(struct inode *ino
* the case then the inode must have been redirtied while it was being written
* out and we don't reset its dirtied_when.
*/
-static void redirty_tail(struct inode *inode)
+static void __redirty_tail(struct inode *inode, int line)
{
struct super_block *sb = inode->i_sb;
+ if (sysctl_dirty_debug) {
+ printk(KERN_DEBUG "redirty_tail line %d: inode %lu\n",
+ line, inode->i_ino);
+ print_inode_pages(inode);
+ }
+
if (!list_empty(&sb->s_dirty)) {
struct inode *tail_inode;
@@ -203,12 +271,23 @@ static void redirty_tail(struct inode *i
list_move(&inode->i_list, &sb->s_dirty);
}
+#define requeue_io(inode) \
+ do { \
+ __requeue_io(inode, __LINE__); \
+ } while (0)
+
/*
* requeue inode for re-scanning after sb->s_io list is exhausted.
*/
-static void requeue_io(struct inode *inode)
+static void __requeue_io(struct inode *inode, int line)
{
list_move(&inode->i_list, &inode->i_sb->s_more_io);
+
+ if (sysctl_dirty_debug) {
+ printk(KERN_DEBUG "requeue_io line %d: inode %lu\n",
+ line, inode->i_ino);
+ print_inode_pages(inode);
+ }
}
static void inode_sync_complete(struct inode *inode)
--- linux-2.6.orig/include/linux/writeback.h
+++ linux-2.6/include/linux/writeback.h
@@ -159,5 +159,6 @@ void writeback_set_ratelimit(void);
extern int nr_pdflush_threads; /* Global so it can be exported to sysctl
read-only. */
+extern int sysctl_dirty_debug;
#endif /* WRITEBACK_H */
--- linux-2.6.orig/kernel/sysctl.c
+++ linux-2.6/kernel/sysctl.c
@@ -1344,6 +1344,14 @@ static struct ctl_table fs_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "dirty_debug",
+ .data = &sysctl_dirty_debug,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
{
.ctl_name = CTL_UNNUMBERED,
--- linux-2.6.orig/mm/page-writeback.c
+++ linux-2.6/mm/page-writeback.c
@@ -104,6 +104,35 @@ EXPORT_SYMBOL(laptop_mode);
/* End of sysctl-exported parameters */
+#define writeback_debug_report(n, wbc) do { \
+ if(sysctl_dirty_debug) \
+ __writeback_debug_report(n, wbc, \
+ __FILE__, __LINE__, __FUNCTION__); \
+} while (0)
+
+void print_writeback_control(struct writeback_control *wbc)
+{
+ printk(KERN_DEBUG
+ "global dirty %lu writeback %lu nfs %lu "
+ "flags %c%c towrite %ld skipped %ld\n",
+ global_page_state(NR_FILE_DIRTY),
+ global_page_state(NR_WRITEBACK),
+ global_page_state(NR_UNSTABLE_NFS),
+ wbc->encountered_congestion ? 'C':'_',
+ wbc->more_io ? 'M':'_',
+ wbc->nr_to_write,
+ wbc->pages_skipped);
+}
+
+void __writeback_debug_report(long n, struct writeback_control *wbc,
+ const char *file, int line, const char *func)
+{
+ printk(KERN_DEBUG "%s %d %s: %s(%d) %ld\n",
+ file, line, func,
+ current->comm, current->pid,
+ n);
+ print_writeback_control(wbc);
+}
static void background_writeout(unsigned long _min_pages);
@@ -476,6 +505,7 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
get_dirty_limits(&background_thresh, &dirty_thresh,
&bdi_thresh, bdi);
+ writeback_debug_report(pages_written, &wbc);
}
/*
@@ -502,6 +532,7 @@ static void balance_dirty_pages(struct a
break; /* We've done our duty */
congestion_wait(WRITE, HZ/10);
+ writeback_debug_report(-pages_written, &wbc);
}
if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
@@ -596,6 +627,11 @@ void throttle_vm_writeout(gfp_t gfp_mask
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
congestion_wait(WRITE, HZ/10);
+ printk(KERN_DEBUG "throttle_vm_writeout: "
+ "congestion_wait on %lu+%lu > %lu\n",
+ global_page_state(NR_UNSTABLE_NFS),
+ global_page_state(NR_WRITEBACK),
+ dirty_thresh);
/*
* The caller might hold locks which can prevent IO completion
@@ -645,7 +681,9 @@ static void background_writeout(unsigned
else
break;
}
+ writeback_debug_report(min_pages, &wbc);
}
+ writeback_debug_report(min_pages, &wbc);
}
/*
@@ -718,7 +756,9 @@ static void wb_kupdate(unsigned long arg
break; /* All the old data is written */
}
nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
+ writeback_debug_report(nr_to_write, &wbc);
}
+ writeback_debug_report(nr_to_write, &wbc);
if (time_before(next_jif, jiffies + HZ))
next_jif = jiffies + HZ;
if (dirty_writeback_interval)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Page Cache writeback too slow, SSD/noop scheduler/ext2
2009-03-25 5:26 ` Page Cache writeback too slow, SSD/noop scheduler/ext2 Wu Fengguang
@ 2009-03-27 16:59 ` Jos Houtman
[not found] ` <C5F2C492.D4A8%jos-vMeIAzyucXQ@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Jos Houtman @ 2009-03-27 16:59 UTC (permalink / raw)
To: Wu Fengguang, Nick Piggin
Cc: linux-kernel, Jeff Layton, Dave Chinner, linux-fsdevel,
jens.axboe, akpm, hch, linux-nfs
[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]
Hi,
>>
>> kupdate surely should just continue to keep trying to write back pages
>> so long as there are more old pages to clean, and the queue isn't
>> congested. That seems to be the intention anyway: MAX_WRITEBACK_PAGES
>> is just the number to write back in a single call, but you see
>> nr_to_write is set to the number of dirty pages in the system.
And when it's congested it should just wait a little bit before continuing.
>> On your system, what must be happening is more_io is not being set.
>> The logic in fs/fs-writeback.c might be busted.
I don't know about more_io, but I agree that the logic seems busted.
>
> Hi Jos,
>
> I prepared a debugging patch for 2.6.28. (I cannot observe writeback
> problems on my local ext2 mount.)
Thanx for the patch, but for the next time: How should I apply it?
it seems to be context aware (@@) and broke on all kernel versions I tried
2.6.28/2.6.28.7/2.6.29
Because I saw the patch only a few hour ago and didn't want to block on your
reply I decided to patch it manually and in the process ported it to 2.6.29.
As for the information the patch provided: It is most helpful.
Attached you will find a list of files containing dirty pages and the count
of there dirty pages, there is also a dmesg output where I trace the
writeback for 40 seconds.
I did some testing on my own using printk's and what I saw is that for the
inodes located on sdb1 (the database) a lot of times they would pass
http://lxr.linux.no/linux+v2.6.29/fs/fs-writeback.c#L335
And then redirty_tail would be called, I haven't had the time to dig deeper,
but that is my primary suspect for the moment.
Thanx again,
Jos
[-- Attachment #2: filecache-27-march.txt --]
[-- Type: application/octet-stream, Size: 4636 bytes --]
grep dirty /proc/vmstat; for i in $( echo ls > /proc/filecache; cat /proc/filecache | awk '{ if ($6 ~ /[D]/) print $0 }' | awk '{if ($12 ~ /database/) print $12}' ); do echo -n "$i dirty: "; echo /var/lib/mysql/$i > /proc/filecache; cat /proc/filecache | grep D | wc -l; done; echo ls > /proc/filecache; cat /proc/filecache | awk '{ if ($6 ~ /[D]/) print $0 }'
nr_dirty 494902
/database/tbltable3.MYD dirty: 58101
/database/tbltable3.MYI dirty: 146806
/database/tbltable4.MYD dirty: 85737
/database/tbltable4.MYI dirty: 101299
/database/tbltable1.MYI dirty: 1727
/database/tbltable5.MYD dirty: 27189
/database/tbltable5.MYI dirty: 16847
/database/tbltable2.MYI dirty: 3
/database/tbltable2.MYD dirty: 3
/database/tbltable1.MYD dirty: 4
0 5242880 1072 0 0 D- 6 0 0 mount 00:02(bdev) (fd:02)
0 2097152 884 0 0 D- 6 0 0 mount 00:02(bdev) (fd:03)
0 5242880 5532 0 0 D- 23 0 0 mount 00:02(bdev) (fd:01)
196632 1 4 100 1 D- 5 1 60 mysqld fd:01(dm-1) /run/mysqld/relay-log.info
196634 306501 306504 100 1 D- 15 1 60 mysqld fd:01(dm-1) /run/mysqld/mysqld-relay-bin.000002
49163 466 48 10 1 D- 12 1 0 syslog-ng fd:02(dm-2) /messages
23 1 4 100 0 D- 11 104 0 bash fd:03(dm-3) /differenceWithLastTime_dm-3
22 1 4 100 0 D- 11 104 0 bash fd:03(dm-3) /differenceWithLastTime_dm-2
21 1 4 100 0 D- 11 104 0 bash fd:03(dm-3) /differenceWithLastTime_dm-1
20 1 4 100 0 D- 11 104 0 bash fd:03(dm-3) /differenceWithLastTime_dm-0
19 1 4 100 0 D- 11 104 0 bash fd:03(dm-3) /differenceWithLastTime_sdb1
18 1 4 100 0 D- 12 104 0 bash fd:03(dm-3) /differenceWithLastTime_sda3
17 1 4 100 0 D- 12 104 0 bash fd:03(dm-3) /differenceWithLastTime_sda2
16 1 4 100 0 D- 12 104 0 bash fd:03(dm-3) /differenceWithLastTime_sda1
14 1 4 100 0 D- 12 104 0 sh fd:03(dm-3) /netdev_old
15 1 4 100 0 D- 12 104 0 perl fd:03(dm-3) /.oldnetstat
13 1 4 100 0 D- 12 520 0 sh fd:03(dm-3) /netdev_new
5865478 3778125 251852 6 2 D- 10 12 0 du 08:11(sdb1) /database/tbltable3.MYD
5865487 12871243 1132552 8 1 D- 10 10 0 du 08:11(sdb1) /database/tbltable3.MYI
5865475 21707331 503688 2 2 D- 10 11 0 du 08:11(sdb1) /database/tbltable4.MYD
5865495 16414241 909488 5 1 D- 10 10 0 du 08:11(sdb1) /database/tbltable4.MYI
11 1 4 100 1 D- 15 1 60 mysqld 08:11(sdb1) /master.info
5865477 150818 51952 34 1 D- 15 10 0 du 08:11(sdb1) /database/tbltable1.MYI
5865479 495759 131276 26 1 D- 15 10 0 du 08:11(sdb1) /database/tbltable5.MYD
5865497 903791 276828 30 1 D- 15 10 0 du 08:11(sdb1) /database/tbltable5.MYI
5865499 118388 88 0 1 D- 17 10 0 du 08:11(sdb1) /database/tbltable2.MYI
5865480 325977 120 0 1 D- 17 10 0 du 08:11(sdb1) /database/tbltable2.MYD
5865489 78503 3300 4 2 D- 18 10 0 du 08:11(sdb1) /database/tbltable1.MYD
175 1 4 100 1 D- 3100 1161 0 udevd 00:0e(tmpfs) /.udev/uevent_seqnum
[-- Attachment #3: dmesg-27-march.txt --]
[-- Type: application/octet-stream, Size: 17351 bytes --]
redirty_tail line 539: inode 10144
inode 10144(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 447069
global dirty 438821 writeback 0 nfs 0 flags __ towrite 981 skipped 0
redirty_tail line 417: inode 5865495
inode 5865495(sdb1/tbltable4.MYI) count 2,1 size 16807325696 pages 197538
0 2 0 UD_P_
1340 2 0 UD_P_
1367 2 0 UD_P_
1417 2 0 UD_P_
3578 2 0 UD_P_
3745 2 0 UD_P_
3928 2 0 UD_P_
4116 2 0 UD_P_
4154 2 0 UD_P_
4207 2 0 UD_P_
4838 2 0 UD_P_
4839 2 0 UD_P_
4840 2 0 UD_P_
5600 2 0 UD_P_
redirty_tail line 539: inode 1522
inode 1522(sysfs/0000:01:00.0) count 1,5 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 448366
global dirty 440133 writeback 91 nfs 0 flags C_ towrite 904 skipped 0
redirty_tail line 417: inode 5865475
inode 5865475(sdb1/tbltable4.MYD) count 2,2 size 22227138512 pages 106269
373 2 0 UD_P_
1041 2 0 UD_P_
2134 2 0 UD_P_
3075 2 0 UD_P_
4563 2 0 UD_P_
5306 2 0 UD_P_
5426 2 0 UD_P_
5473 2 0 UD_P_
5657 2 0 UD_P_
5770 2 0 UD_P_
5983 2 0 UD_P_
7883 2 0 UD_P_
9061 2 0 UD_P_
9241 2 0 UD_P_
redirty_tail line 539: inode 10297
inode 10297(sysfs/subsystem) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 448354
global dirty 440133 writeback 91 nfs 0 flags C_ towrite 1012 skipped 0
redirty_tail line 417: inode 5865487
inode 5865487(sdb1/tbltable3.MYI) count 2,1 size 13179606016 pages 250251
0 2 0 UD_P_
65 2 0 UD_P_
109 2 0 UD_P_
195 2 0 UD_P_
200 2 0 UD_P_
368 2 0 UD_P_
473 2 0 UD_P_
481 2 0 UD_P_
530 2 0 UD_P_
547 2 0 UD_P_
695 2 0 UD_P_
735 2 0 UD_P_
751 2 0 UD_P_
970 2 0 UD_P_
redirty_tail line 539: inode 10302
inode 10302(sysfs/driver) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 448342
global dirty 440133 writeback 91 nfs 0 flags C_ towrite 1012 skipped 0
redirty_tail line 417: inode 5865478
inode 5865478(sdb1/tbltable3.MYD) count 2,2 size 3868679145 pages 55009
159 2 0 UD_P_
195 2 0 UD_P_
207 2 0 UD_P_
211 2 0 UD_P_
220 2 0 UD_P_
282 2 0 UD_P_
283 2 0 UD_P_
324 2 0 UD_P_
436 2 0 UD_P_
439 2 0 UD_P_
548 2 0 UD_P_
553 2 0 UD_P_
682 2 0 UD_P_
934 2 0 UD_P_
redirty_tail line 539: inode 10066
inode 10066(sysfs/timeout) count 1,0 size 4096 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 448329
global dirty 440042 writeback 91 nfs 0 flags C_ towrite 1011 skipped 0
redirty_tail line 539: inode 10289
inode 10289(sysfs/timeout) count 1,0 size 4096 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 448329
global dirty 440042 writeback 91 nfs 0 flags __ towrite 1024 skipped 0
redirty_tail line 539: inode 8766
inode 8766(sysfs/ram0) count 1,14 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 448543
global dirty 440295 writeback 0 nfs 0 flags __ towrite 1024 skipped 0
redirty_tail line 539: inode 8778
inode 8778(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 448873
global dirty 440625 writeback 0 nfs 0 flags __ towrite 1016 skipped 0
redirty_tail line 539: inode 8780
inode 8780(sysfs/holders) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 449115
global dirty 440867 writeback 0 nfs 0 flags __ towrite 1021 skipped 0
redirty_tail line 539: inode 8781
inode 8781(sysfs/slaves) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 449441
global dirty 441193 writeback 0 nfs 0 flags __ towrite 984 skipped 0
redirty_tail line 417: inode 5865497
inode 5865497(sdb1/tbltable5.MYI) count 2,1 size 925447168 pages 63265
0 2 0 UD_P_
1172 2 0 UD_P_
1187 2 0 UD_P_
1522 2 0 UD_P_
1567 2 0 UD_P_
1720 2 0 UD_P_
1760 2 0 UD_P_
1807 2 0 UD_P_
1811 2 0 UD_P_
2272 2 0 UD_P_
2377 2 0 UD_P_
2534 2 0 UD_P_
2647 2 0 UD_P_
2777 2 0 UD_P_
redirty_tail line 539: inode 8791
inode 8791(sysfs/ram1) count 1,14 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449463
global dirty 441220 writeback 182 nfs 0 flags C_ towrite 564 skipped 0
redirty_tail line 417: inode 5865479
inode 5865479(sdb1/tbltable5.MYD) count 2,1 size 507638336 pages 29268
0 2 0 UD_P_
1 2 0 UD_P_
2 2 0 UD_P_
3 2 0 UD_P_
4 2 0 UD_P_
8 2 0 UD_P_
12 2 0 UD_P_
16 2 0 UD_P_
22 2 0 UD_P_
26 2 0 UD_P_
32 2 0 UD_P_
33 2 0 UD_P_
40 2 0 UD_P_
41 2 0 UD_P_
redirty_tail line 539: inode 8803
inode 8803(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449448
global dirty 441220 writeback 182 nfs 0 flags C_ towrite 1009 skipped 0
redirty_tail line 539: inode 8805
inode 8805(sysfs/holders) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449448
global dirty 441220 writeback 182 nfs 0 flags C_ towrite 1024 skipped 0
redirty_tail line 539: inode 8806
inode 8806(sysfs/slaves) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449448
global dirty 441220 writeback 182 nfs 0 flags C_ towrite 1024 skipped 0
redirty_tail line 539: inode 8816
inode 8816(sysfs/ram2) count 1,14 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449448
global dirty 441220 writeback 182 nfs 0 flags C_ towrite 1024 skipped 0
redirty_tail line 539: inode 8828
inode 8828(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449448
global dirty 441220 writeback 182 nfs 0 flags C_ towrite 1024 skipped 0
redirty_tail line 417: inode 5865477
inode 5865477(sdb1/tbltable1.MYI) count 2,1 size 154434560 pages 11790
0 2 0 UD_P_
68 2 0 UD_P_
106 2 0 UD_P_
131 2 0 UD_P_
185 2 0 UD_P_
244 2 0 UD_P_
252 2 0 UD_P_
331 2 0 UD_P_
393 2 0 UD_P_
394 2 0 UD_P_
395 2 0 UD_P_
400 2 0 UD_P_
482 2 0 UD_P_
781 2 0 UD_P_
redirty_tail line 539: inode 8830
inode 8830(sysfs/holders) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449436
global dirty 441220 writeback 182 nfs 0 flags C_ towrite 1012 skipped 0
redirty_tail line 539: inode 8831
inode 8831(sysfs/slaves) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 449436
global dirty 441220 writeback 182 nfs 0 flags __ towrite 1024 skipped 0
redirty_tail line 417: inode 5865495
inode 5865495(sdb1/tbltable4.MYI) count 2,1 size 16807348224 pages 197993
0 2 0 UD_P_
1340 2 0 UD_P_
1367 2 0 UD_P_
1417 2 0 UD_P_
3578 2 0 UD_P_
3745 2 0 UD_P_
3928 2 0 UD_P_
4116 2 0 UD_P_
4154 2 0 UD_P_
4207 2 0 UD_P_
4838 2 0 UD_P_
4839 2 0 UD_P_
4840 2 0 UD_P_
5600 2 0 UD_P_
redirty_tail line 539: inode 8841
inode 8841(sysfs/ram3) count 1,14 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449551
global dirty 441343 writeback 91 nfs 0 flags C_ towrite 893 skipped 0
redirty_tail line 417: inode 5865475
inode 5865475(sdb1/tbltable4.MYD) count 2,2 size 22227169324 pages 106507
373 2 0 UD_P_
1041 2 0 UD_P_
2134 2 0 UD_P_
3075 2 0 UD_P_
4563 2 0 UD_P_
5306 2 0 UD_P_
5426 2 0 UD_P_
5473 2 0 UD_P_
5657 2 0 UD_P_
5770 2 0 UD_P_
5983 2 0 UD_P_
7883 2 0 UD_P_
9061 2 0 UD_P_
9241 2 0 UD_P_
redirty_tail line 539: inode 8853
inode 8853(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449538
global dirty 441343 writeback 91 nfs 0 flags C_ towrite 1011 skipped 0
redirty_tail line 417: inode 5865487
inode 5865487(sdb1/tbltable3.MYI) count 2,1 size 13179631616 pages 251244
0 2 0 UD_P_
65 2 0 UD_P_
109 2 0 UD_P_
195 2 0 UD_P_
200 2 0 UD_P_
368 2 0 UD_P_
473 2 0 UD_P_
481 2 0 UD_P_
530 2 0 UD_P_
547 2 0 UD_P_
695 2 0 UD_P_
735 2 0 UD_P_
751 2 0 UD_P_
970 2 0 UD_P_
redirty_tail line 539: inode 8855
inode 8855(sysfs/holders) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449526
global dirty 441343 writeback 91 nfs 0 flags C_ towrite 1012 skipped 0
redirty_tail line 417: inode 5865478
inode 5865478(sdb1/tbltable3.MYD) count 2,2 size 3868689540 pages 55174
159 2 0 UD_P_
195 2 0 UD_P_
207 2 0 UD_P_
211 2 0 UD_P_
220 2 0 UD_P_
282 2 0 UD_P_
283 2 0 UD_P_
324 2 0 UD_P_
436 2 0 UD_P_
439 2 0 UD_P_
548 2 0 UD_P_
553 2 0 UD_P_
682 2 0 UD_P_
934 2 0 UD_P_
redirty_tail line 539: inode 8856
inode 8856(sysfs/slaves) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 449514
global dirty 441343 writeback 91 nfs 0 flags C_ towrite 1012 skipped 0
redirty_tail line 539: inode 8866
inode 8866(sysfs/ram4) count 1,14 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 449514
global dirty 441343 writeback 91 nfs 0 flags __ towrite 1024 skipped 0
redirty_tail line 539: inode 8878
inode 8878(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 449732
global dirty 441484 writeback 0 nfs 0 flags __ towrite 1000 skipped 0
redirty_tail line 539: inode 8880
inode 8880(sysfs/holders) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 450062
global dirty 441814 writeback 0 nfs 0 flags __ towrite 1024 skipped 0
redirty_tail line 539: inode 8881
inode 8881(sysfs/slaves) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 450496
global dirty 442248 writeback 0 nfs 0 flags __ towrite 1014 skipped 0
redirty_tail line 539: inode 8891
inode 8891(sysfs/ram5) count 1,14 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 450655
global dirty 442407 writeback 0 nfs 0 flags __ towrite 1018 skipped 0
redirty_tail line 417: inode 5865497
inode 5865497(sdb1/tbltable5.MYI) count 2,1 size 925447168 pages 63374
0 2 0 UD_P_
1172 2 0 UD_P_
1187 2 0 UD_P_
1522 2 0 UD_P_
1567 2 0 UD_P_
1720 2 0 UD_P_
1760 2 0 UD_P_
1807 2 0 UD_P_
1811 2 0 UD_P_
2272 2 0 UD_P_
2377 2 0 UD_P_
2534 2 0 UD_P_
2647 2 0 UD_P_
2777 2 0 UD_P_
redirty_tail line 539: inode 8903
inode 8903(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 450809
global dirty 442608 writeback 91 nfs 0 flags C_ towrite 886 skipped 0
redirty_tail line 417: inode 5865479
inode 5865479(sdb1/tbltable5.MYD) count 2,1 size 507638658 pages 29338
0 2 0 UD_P_
1 2 0 UD_P_
2 2 0 UD_P_
3 2 0 UD_P_
4 2 0 UD_P_
8 2 0 UD_P_
12 2 0 UD_P_
16 2 0 UD_P_
22 2 0 UD_P_
26 2 0 UD_P_
32 2 0 UD_P_
33 2 0 UD_P_
40 2 0 UD_P_
41 2 0 UD_P_
redirty_tail line 539: inode 8905
inode 8905(sysfs/holders) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 450793
global dirty 442517 writeback 91 nfs 0 flags C_ towrite 1008 skipped 0
redirty_tail line 417: inode 5865477
inode 5865477(sdb1/tbltable1.MYI) count 2,1 size 154434560 pages 11815
0 2 0 UD_P_
68 2 0 UD_P_
106 2 0 UD_P_
131 2 0 UD_P_
185 2 0 UD_P_
244 2 0 UD_P_
252 2 0 UD_P_
331 2 0 UD_P_
393 2 0 UD_P_
394 2 0 UD_P_
395 2 0 UD_P_
400 2 0 UD_P_
482 2 0 UD_P_
781 2 0 UD_P_
redirty_tail line 539: inode 8906
inode 8906(sysfs/slaves) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 450781
global dirty 442517 writeback 182 nfs 0 flags C_ towrite 1012 skipped 0
redirty_tail line 539: inode 8916
inode 8916(sysfs/ram6) count 1,14 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 450781
global dirty 442517 writeback 182 nfs 0 flags __ towrite 1024 skipped 0
redirty_tail line 417: inode 5865495
inode 5865495(sdb1/tbltable4.MYI) count 2,1 size 16807370752 pages 198676
0 2 0 UD_P_
1340 2 0 UD_P_
1367 2 0 UD_P_
1417 2 0 UD_P_
3578 2 0 UD_P_
3745 2 0 UD_P_
3928 2 0 UD_P_
4116 2 0 UD_P_
4154 2 0 UD_P_
4207 2 0 UD_P_
4838 2 0 UD_P_
4839 2 0 UD_P_
4840 2 0 UD_P_
5600 2 0 UD_P_
redirty_tail line 539: inode 8928
inode 8928(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 450654
global dirty 442401 writeback 91 nfs 0 flags C_ towrite 644 skipped 0
redirty_tail line 417: inode 5865475
inode 5865475(sdb1/tbltable4.MYD) count 2,2 size 22227197236 pages 106846
373 2 0 UD_P_
1041 2 0 UD_P_
2134 2 0 UD_P_
3075 2 0 UD_P_
4563 2 0 UD_P_
5306 2 0 UD_P_
5426 2 0 UD_P_
5473 2 0 UD_P_
5657 2 0 UD_P_
5770 2 0 UD_P_
5983 2 0 UD_P_
7883 2 0 UD_P_
9061 2 0 UD_P_
9241 2 0 UD_P_
redirty_tail line 539: inode 8930
inode 8930(sysfs/holders) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 450641
global dirty 442401 writeback 91 nfs 0 flags C_ towrite 1011 skipped 0
redirty_tail line 417: inode 5865487
inode 5865487(sdb1/tbltable3.MYI) count 2,1 size 13179643904 pages 251935
0 2 0 UD_P_
65 2 0 UD_P_
109 2 0 UD_P_
195 2 0 UD_P_
200 2 0 UD_P_
368 2 0 UD_P_
473 2 0 UD_P_
481 2 0 UD_P_
530 2 0 UD_P_
547 2 0 UD_P_
695 2 0 UD_P_
735 2 0 UD_P_
751 2 0 UD_P_
970 2 0 UD_P_
redirty_tail line 539: inode 8931
inode 8931(sysfs/slaves) count 1,0 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 450629
global dirty 442401 writeback 91 nfs 0 flags C_ towrite 1012 skipped 0
redirty_tail line 417: inode 5865478
inode 5865478(sdb1/tbltable3.MYD) count 2,2 size 3868692045 pages 55323
159 2 0 UD_P_
195 2 0 UD_P_
207 2 0 UD_P_
211 2 0 UD_P_
220 2 0 UD_P_
282 2 0 UD_P_
283 2 0 UD_P_
324 2 0 UD_P_
436 2 0 UD_P_
439 2 0 UD_P_
548 2 0 UD_P_
553 2 0 UD_P_
682 2 0 UD_P_
934 2 0 UD_P_
redirty_tail line 539: inode 8941
inode 8941(sysfs/ram7) count 1,14 size 0 pages 0
mm/page-writeback.c 829 wb_kupdate: pdflush(361) 450617
global dirty 442401 writeback 91 nfs 0 flags C_ towrite 1012 skipped 0
redirty_tail line 539: inode 8953
inode 8953(sysfs/power) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 450617
global dirty 442401 writeback 91 nfs 0 flags __ towrite 1024 skipped 0
redirty_tail line 539: inode 8955
inode 8955(sysfs/holders) count 1,0 size 0 pages 0
mm/page-writeback.c 831 wb_kupdate: pdflush(361) 450881
global dirty 442633 writeback 0 nfs 0 flags __ towrite 977 skipped 0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Page Cache writeback too slow, SSD/noop scheduler/ext2
[not found] ` <C5F2C492.D4A8%jos-vMeIAzyucXQ@public.gmane.org>
@ 2009-03-29 2:32 ` Wu Fengguang
2009-03-30 16:47 ` Jos Houtman
0 siblings, 1 reply; 8+ messages in thread
From: Wu Fengguang @ 2009-03-29 2:32 UTC (permalink / raw)
To: Jos Houtman
Cc: Nick Piggin, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jeff Layton, Dave Chinner,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
[-- Attachment #1: Type: text/plain, Size: 3171 bytes --]
On Sat, Mar 28, 2009 at 12:59:43AM +0800, Jos Houtman wrote:
> Hi,
>
> >>
> >> kupdate surely should just continue to keep trying to write back pages
> >> so long as there are more old pages to clean, and the queue isn't
> >> congested. That seems to be the intention anyway: MAX_WRITEBACK_PAGES
> >> is just the number to write back in a single call, but you see
> >> nr_to_write is set to the number of dirty pages in the system.
>
> And when it's congested it should just wait a little bit before continuing.
>
> >> On your system, what must be happening is more_io is not being set.
> >> The logic in fs/fs-writeback.c might be busted.
>
> I don't know about more_io, but I agree that the logic seems busted.
>
> >
> > Hi Jos,
> >
> > I prepared a debugging patch for 2.6.28. (I cannot observe writeback
> > problems on my local ext2 mount.)
>
> Thanx for the patch, but for the next time: How should I apply it?
> it seems to be context aware (@@) and broke on all kernel versions I tried
> 2.6.28/2.6.28.7/2.6.29
Do you mean that the patch applies after removing " @@.*$"?
To be safe, I created the patch with quilt as well as git, for 2.6.29.
> Because I saw the patch only a few hour ago and didn't want to block on your
> reply I decided to patch it manually and in the process ported it to 2.6.29.
>
> As for the information the patch provided: It is most helpful.
>
> Attached you will find a list of files containing dirty pages and the count
> of there dirty pages, there is also a dmesg output where I trace the
> writeback for 40 seconds.
They helped, thank you!
> I did some testing on my own using printk's and what I saw is that for the
> inodes located on sdb1 (the database) a lot of times they would pass
> http://lxr.linux.no/linux+v2.6.29/fs/fs-writeback.c#L335
> And then redirty_tail would be called, I haven't had the time to dig deeper,
> but that is my primary suspect for the moment.
You are right. In your case, there are several big dirty files in sdb1,
and the sdb write queue is constantly (almost-)congested. The SSD write
speed is so slow, that in each round of sdb1 writeback, it begins with
an uncongested queue, but quickly fills up after writing some pages.
Hence all the inodes will get redirtied because of (nr_to_write > 0).
The following quick fix should solve the slow-writeback-on-congested-SSD
problem. However the writeback sequence is suboptimal: it sync-and-requeue
each file until congested (in your case about 3~600 pages) instead of
until MAX_WRITEBACK_PAGES=1024 pages.
A more complete fix would be turning MAX_WRITEBACK_PAGES into an exact
per-file limit. It has been sitting in my todo list for quite a while...
Thanks,
Fengguang
---
fs/fs-writeback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- mm.orig/fs/fs-writeback.c
+++ mm/fs/fs-writeback.c
@@ -325,7 +325,8 @@ __sync_single_inode(struct inode *inode,
* soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
- if (wbc->nr_to_write <= 0) {
+ if (wbc->nr_to_write <= 0 ||
+ wbc->encountered_congestion) {
/*
* slice used up: queue for next turn
*/
[-- Attachment #2: writeback-requeue-congestion-quickfix.patch --]
[-- Type: text/x-diff, Size: 486 bytes --]
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e3fe991..da5f88d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -325,7 +325,8 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc)
* soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
- if (wbc->nr_to_write <= 0) {
+ if (wbc->nr_to_write <= 0 ||
+ wbc->encountered_congestion) {
/*
* slice used up: queue for next turn
*/
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Page Cache writeback too slow, SSD/noop scheduler/ext2
2009-03-29 2:32 ` Wu Fengguang
@ 2009-03-30 16:47 ` Jos Houtman
[not found] ` <C5F6B627.D9D0%jos-vMeIAzyucXQ@public.gmane.org>
2009-03-31 12:16 ` Jos Houtman
0 siblings, 2 replies; 8+ messages in thread
From: Jos Houtman @ 2009-03-30 16:47 UTC (permalink / raw)
To: Wu Fengguang
Cc: Nick Piggin, linux-kernel, Jeff Layton, Dave Chinner,
linux-fsdevel, jens.axboe, akpm, hch, linux-nfs
>> Thanx for the patch, but for the next time: How should I apply it?
>> it seems to be context aware (@@) and broke on all kernel versions I tried
>> 2.6.28/2.6.28.7/2.6.29
>
> Do you mean that the patch applies after removing " @@.*$"?
I didn't try that, but this time it worked. So it was probably my error.
>
> You are right. In your case, there are several big dirty files in sdb1,
> and the sdb write queue is constantly (almost-)congested. The SSD write
> speed is so slow, that in each round of sdb1 writeback, it begins with
> an uncongested queue, but quickly fills up after writing some pages.
> Hence all the inodes will get redirtied because of (nr_to_write > 0).
>
> The following quick fix should solve the slow-writeback-on-congested-SSD
> problem. However the writeback sequence is suboptimal: it sync-and-requeue
> each file until congested (in your case about 3~600 pages) instead of
> until MAX_WRITEBACK_PAGES=1024 pages.
Yeah that fixed it, but performance dropped due to the more constant
congestion. So I will need to try some different io schedulers.
Next to that I was wondering if there are any plans to make sure that not
all dirty-files are written back in the same interval.
In my case all database files are written back each 30 seconds, while I
would prefer them to be more divided over the interval.
Thanks,
Jos
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Page Cache writeback too slow, SSD/noop scheduler/ext2
[not found] ` <C5F6B627.D9D0%jos-vMeIAzyucXQ@public.gmane.org>
@ 2009-03-31 0:28 ` Wu Fengguang
0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2009-03-31 0:28 UTC (permalink / raw)
To: Jos Houtman
Cc: Nick Piggin, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jeff Layton, Dave Chinner,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Tue, Mar 31, 2009 at 12:47:19AM +0800, Jos Houtman wrote:
>
> >> Thanx for the patch, but for the next time: How should I apply it?
> >> it seems to be context aware (@@) and broke on all kernel versions I tried
> >> 2.6.28/2.6.28.7/2.6.29
> >
> > Do you mean that the patch applies after removing " @@.*$"?
>
> I didn't try that, but this time it worked. So it was probably my error.
>
> >
> > You are right. In your case, there are several big dirty files in sdb1,
> > and the sdb write queue is constantly (almost-)congested. The SSD write
> > speed is so slow, that in each round of sdb1 writeback, it begins with
> > an uncongested queue, but quickly fills up after writing some pages.
> > Hence all the inodes will get redirtied because of (nr_to_write > 0).
> >
> > The following quick fix should solve the slow-writeback-on-congested-SSD
> > problem. However the writeback sequence is suboptimal: it sync-and-requeue
> > each file until congested (in your case about 3~600 pages) instead of
> > until MAX_WRITEBACK_PAGES=1024 pages.
>
> Yeah that fixed it, but performance dropped due to the more constant
> congestion. So I will need to try some different io schedulers.
Read performance or write performance?
> Next to that I was wondering if there are any plans to make sure that not
> all dirty-files are written back in the same interval.
>
> In my case all database files are written back each 30 seconds, while I
> would prefer them to be more divided over the interval.
pdflush will wake up every 5s to sync files dirtied more than 30s.
So the writeback of inodes should be distributed(somehow randomly)
into these 5s-interval-wakeups due to varied dirty times.
However the distribution may well be uneven in may cases. It seems to
be conflicting goals for HDD and SSD: one favors somehow small busty
writeback, another favors smooth writeback streams. I guess the better
scheme would be bursty pdflush writebacks plus IO scheduler level QoS.
Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Page Cache writeback too slow, SSD/noop scheduler/ext2
2009-03-30 16:47 ` Jos Houtman
[not found] ` <C5F6B627.D9D0%jos-vMeIAzyucXQ@public.gmane.org>
@ 2009-03-31 12:16 ` Jos Houtman
[not found] ` <C5F7D654.DE6F%jos-vMeIAzyucXQ@public.gmane.org>
1 sibling, 1 reply; 8+ messages in thread
From: Jos Houtman @ 2009-03-31 12:16 UTC (permalink / raw)
To: Jos Houtman, Wu Fengguang
Cc: Nick Piggin, linux-kernel, Jeff Layton, Dave Chinner,
linux-fsdevel, jens.axboe, akpm, hch, linux-nfs
>
> Next to that I was wondering if there are any plans to make sure that not
> all dirty-files are written back in the same interval.
>
> In my case all database files are written back each 30 seconds, while I
> would prefer them to be more divided over the interval.
There another question I have: does the writeback go through the io
scheduler? Because no matter the io scheduler or the tuning done, the
writeback algorithm totally starves the reads.
See the url below for an example with CFQ, but deadline or noop all show
this behaviour:
http://94.100.113.33/535450001-535500000/535451701-535451800/535451800_6_L7g
t.jpeg
Is there anything I can do about this behaviour by creating a better
interleaving of the reads and writes?
Jos
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Page Cache writeback too slow, SSD/noop scheduler/ext2
[not found] ` <C5F7D654.DE6F%jos-vMeIAzyucXQ@public.gmane.org>
@ 2009-03-31 12:31 ` Wu Fengguang
2009-03-31 14:10 ` Jos Houtman
0 siblings, 1 reply; 8+ messages in thread
From: Wu Fengguang @ 2009-03-31 12:31 UTC (permalink / raw)
To: Jos Houtman
Cc: Nick Piggin, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jeff Layton, Dave Chinner,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Tue, Mar 31, 2009 at 08:16:52PM +0800, Jos Houtman wrote:
> >
> > Next to that I was wondering if there are any plans to make sure that not
> > all dirty-files are written back in the same interval.
> >
> > In my case all database files are written back each 30 seconds, while I
> > would prefer them to be more divided over the interval.
>
> There another question I have: does the writeback go through the io
> scheduler? Because no matter the io scheduler or the tuning done, the
> writeback algorithm totally starves the reads.
I noticed this annoying writes-starve-reads problem too. I'll look into it.
> See the url below for an example with CFQ, but deadline or noop all show
> this behaviour:
> http://94.100.113.33/535450001-535500000/535451701-535451800/535451800_6_L7g
> t.jpeg
>
> Is there anything I can do about this behaviour by creating a better
> interleaving of the reads and writes?
I guess it should be handled in the generic block io layer. Once we
solved the writes-starve-reads problem, the bursty-writeback behavior
becomes a no-problem for SSD.
Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Page Cache writeback too slow, SSD/noop scheduler/ext2
2009-03-31 12:31 ` Wu Fengguang
@ 2009-03-31 14:10 ` Jos Houtman
0 siblings, 0 replies; 8+ messages in thread
From: Jos Houtman @ 2009-03-31 14:10 UTC (permalink / raw)
To: Wu Fengguang
Cc: Nick Piggin, linux-kernel, Jeff Layton, Dave Chinner,
linux-fsdevel, jens.axboe, akpm, hch, linux-nfs
>> There another question I have: does the writeback go through the io
>> scheduler? Because no matter the io scheduler or the tuning done, the
>> writeback algorithm totally starves the reads.
>
> I noticed this annoying writes-starve-reads problem too. I'll look into it.
Thanks
>
>> Is there anything I can do about this behaviour by creating a better
>> interleaving of the reads and writes?
>
> I guess it should be handled in the generic block io layer. Once we
> solved the writes-starve-reads problem, the bursty-writeback behavior
> becomes a no-problem for SSD.
Yeah this was the part where I figured the io-schedulers kicked in, but
obviously I was wrong :P.
If I can do anything more to help this along, let me know.
Thanks
Jos
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-03-31 14:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <C5EC2B99.C3B3%jos@hyves.nl>
[not found] ` <200903250148.53644.nickpiggin@yahoo.com.au>
[not found] ` <200903250148.53644.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
2009-03-25 5:26 ` Page Cache writeback too slow, SSD/noop scheduler/ext2 Wu Fengguang
2009-03-27 16:59 ` Jos Houtman
[not found] ` <C5F2C492.D4A8%jos-vMeIAzyucXQ@public.gmane.org>
2009-03-29 2:32 ` Wu Fengguang
2009-03-30 16:47 ` Jos Houtman
[not found] ` <C5F6B627.D9D0%jos-vMeIAzyucXQ@public.gmane.org>
2009-03-31 0:28 ` Wu Fengguang
2009-03-31 12:16 ` Jos Houtman
[not found] ` <C5F7D654.DE6F%jos-vMeIAzyucXQ@public.gmane.org>
2009-03-31 12:31 ` Wu Fengguang
2009-03-31 14:10 ` Jos Houtman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).