* Writeback tests @ 2011-07-14 22:52 Curt Wohlgemuth 2011-07-15 15:33 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Curt Wohlgemuth @ 2011-07-14 22:52 UTC (permalink / raw) To: Wu Fengguang, Jan Kara, Andrew Morton, Christoph Hellwig, Dave Chinner During LSF last spring, Michael Rubin signed up to create a set of writeback tests for use by the Linux community. Borrowing heavily from Fengguang's tests at http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/scripts/ I've created a test infrastructure for writeback testing, available from http://google3-2.osuosl.org/?p=tests/wbtests.git;a=summary See the README for details on how to run tests and report them. It uses FIO, creating multiple FIO processes with possibly different resource restrictions, and does as much sampling and tracing as is available on the system during the test run. The configurations available now are fairly minimal, and all use 1 or 2 disks; adding config files to use NFS or other setups should be easy. An example of the HTML reporting output available is in extra/html-example.tar.gz ; untar this out and point a browser at example/index.html and you can check it out. (Note, though, that this example uses counters and a few tracepoint enhancements that aren't in the upstream kernel -- e.g., "sdb WB pages" shows the cause of page writeback over the benchmark run. We'd love to see these counters in the mainline kernel, as they've been really helpful in debugging problems, but they are somewhat intrusive.) Comments welcome! Thanks, Curt ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Writeback tests 2011-07-14 22:52 Writeback tests Curt Wohlgemuth @ 2011-07-15 15:33 ` Christoph Hellwig 2011-07-15 23:41 ` Curt Wohlgemuth 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2011-07-15 15:33 UTC (permalink / raw) To: Curt Wohlgemuth Cc: Wu Fengguang, Jan Kara, Andrew Morton, Christoph Hellwig, Dave Chinner, Michael Rubin, linux-fsdevel On Thu, Jul 14, 2011 at 03:52:31PM -0700, Curt Wohlgemuth wrote: > example/index.html and you can check it out. (Note, though, that this > example uses counters and a few tracepoint enhancements that aren't in > the upstream kernel -- e.g., "sdb WB pages" shows the cause of page > writeback over the benchmark run. We'd love to see these counters in > the mainline kernel, as they've been really helpful in debugging > problems, but they are somewhat intrusive.) Do you have a pointer to the required patched? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Writeback tests 2011-07-15 15:33 ` Christoph Hellwig @ 2011-07-15 23:41 ` Curt Wohlgemuth 2011-07-15 23:44 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Curt Wohlgemuth @ 2011-07-15 23:41 UTC (permalink / raw) To: Christoph Hellwig Cc: Wu Fengguang, Jan Kara, Andrew Morton, Dave Chinner, Michael Rubin, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 881 bytes --] Hi Christoph: On Fri, Jul 15, 2011 at 8:33 AM, Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Jul 14, 2011 at 03:52:31PM -0700, Curt Wohlgemuth wrote: >> example/index.html and you can check it out. (Note, though, that this >> example uses counters and a few tracepoint enhancements that aren't in >> the upstream kernel -- e.g., "sdb WB pages" shows the cause of page >> writeback over the benchmark run. We'd love to see these counters in >> the mainline kernel, as they've been really helpful in debugging >> problems, but they are somewhat intrusive.) > > Do you have a pointer to the required patched? I created a patch based on 3.0.0-rc7, and pushed it to extra/0001-writeback-Add-writeback-stats.patch in the git tree at http://google3-2.osuosl.org/?p=tests/wbtests.git;a=summary . Attaching it to this mail as well. Thanks, Curt [-- Attachment #2: 0001-writeback-Add-writeback-stats.patch --] [-- Type: text/x-patch, Size: 21755 bytes --] From 8e8eec4aa8c0a5b8ea3363be23da406581586cfa Mon Sep 17 00:00:00 2001 From: Curt Wohlgemuth <curtw@google.com> Date: Fri, 15 Jul 2011 15:50:13 -0700 Subject: [PATCH 1/2] writeback: Add writeback stats This creates a global file, /proc/writeback/stats which displays global data about the writeback subsystem: $ cat /proc/writeback/stats call: balance_dirty_pages 2253 call: background_writeout 45 call: try_to_free_pages 0 call: sync 14 call: kupdate 228 call: shrink_page_list 0 call: fdatawrite 6755 call: laptop_periodic 0 call: free_more_memory 0 page: balance_dirty_pages 455830 page: background_writeout 391568 page: try_to_free_pages 0 page: sync 133 page: kupdate 1158779 page: shrink_page_list 0 page: fdatawrite 777320 page: laptop_periodic 0 page: free_more_memory 0 periodic writeback 241 single inode wait 0 writeback_wb wait 7 metadata pages cleaned 19473 Per-BDI stats are available as well, in /sys/block/<device>/bdi/writeback_stats Signed-off-by: Curt Wohlgemuth <curtw@google.com> --- fs/buffer.c | 2 +- fs/fs-writeback.c | 37 ++++++++++-- fs/proc/Makefile | 1 + fs/proc/internal.h | 1 + fs/proc/proc_writeback.c | 145 +++++++++++++++++++++++++++++++++++++++++++ fs/proc/root.c | 1 + fs/sync.c | 2 +- include/linux/backing-dev.h | 8 ++- include/linux/writeback.h | 50 +++++++++++++++- mm/backing-dev.c | 41 ++++++++++++ mm/filemap.c | 7 ++ mm/page-writeback.c | 7 ++- mm/vmscan.c | 10 +++- 13 files changed, 301 insertions(+), 11 deletions(-) create mode 100644 fs/proc/proc_writeback.c diff --git a/fs/buffer.c b/fs/buffer.c index 1a80b04..cfca642 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -285,7 +285,7 @@ static void free_more_memory(void) struct zone *zone; int nid; - wakeup_flusher_threads(1024); + wakeup_flusher_threads(1024, WB_STAT_FREE_MORE_MEM); yield(); for_each_online_node(nid) { diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 0f015a0..472ec44 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -39,6 +39,7 @@ struct wb_writeback_work { unsigned int for_kupdate:1; unsigned int range_cyclic:1; unsigned int for_background:1; + int why; struct list_head list; /* pending work list */ struct completion *done; /* set if the caller waits */ @@ -113,7 +114,7 @@ static void bdi_queue_work(struct backing_dev_info *bdi, static void __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, - bool range_cyclic) + bool range_cyclic, enum wb_stats stat) { struct wb_writeback_work *work; @@ -133,6 +134,7 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, work->sync_mode = WB_SYNC_NONE; work->nr_pages = nr_pages; work->range_cyclic = range_cyclic; + work->why = stat; bdi_queue_work(bdi, work); } @@ -148,9 +150,10 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, * completion. Caller need not hold sb s_umount semaphore. * */ -void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages) +void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, + enum wb_stats stat) { - __bdi_start_writeback(bdi, nr_pages, true); + __bdi_start_writeback(bdi, nr_pages, true, stat); } /** @@ -374,6 +377,8 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc) /* * It's a data-integrity sync. We must wait. */ + bdi_writeback_stat_inc(inode->i_mapping->backing_dev_info, + WB_STAT_SINGLE_INODE_WAIT); inode_wait_for_writeback(inode); } @@ -501,6 +506,7 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb, { while (!list_empty(&wb->b_io)) { long pages_skipped; + int req, wrote; struct inode *inode = wb_inode(wb->b_io.prev); if (inode->i_sb != sb) { @@ -546,7 +552,9 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb, __iget(inode); pages_skipped = wbc->pages_skipped; + req = wbc->nr_to_write; writeback_single_inode(inode, wbc); + wrote = req - wbc->nr_to_write; if (wbc->pages_skipped != pages_skipped) { /* * writeback is not making progress due to locked @@ -554,6 +562,9 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb, */ redirty_tail(inode); } + if (inode->i_ino == 0) + bdi_writeback_stat_add(wb->bdi, + WB_STAT_METADATA_PAGES_CLEANED, wrote); spin_unlock(&inode->i_lock); spin_unlock(&inode_wb_list_lock); iput(inode); @@ -688,6 +699,8 @@ static long wb_writeback(struct bdi_writeback *wb, else write_chunk = LONG_MAX; + bdi_writeback_stat_inc(wb->bdi, work->why); + wbc.wb_start = jiffies; /* livelock avoidance */ for (;;) { /* @@ -724,6 +737,12 @@ static long wb_writeback(struct bdi_writeback *wb, writeback_inodes_wb(wb, &wbc); trace_wbc_writeback_written(&wbc, wb->bdi); + if (work->why < WB_STAT_PG_COUNT_BASE) { + bdi_writeback_stat_add(wb->bdi, + work->why + WB_STAT_PG_COUNT_BASE, + write_chunk - wbc.nr_to_write); + } + work->nr_pages -= write_chunk - wbc.nr_to_write; wrote += write_chunk - wbc.nr_to_write; @@ -752,6 +771,9 @@ static long wb_writeback(struct bdi_writeback *wb, inode = wb_inode(wb->b_more_io.prev); trace_wbc_writeback_wait(&wbc, wb->bdi); spin_lock(&inode->i_lock); + bdi_writeback_stat_inc( + inode->i_mapping->backing_dev_info, + WB_STAT_WRITEBACK_WB_WAIT); inode_wait_for_writeback(inode); spin_unlock(&inode->i_lock); } @@ -799,6 +821,7 @@ static long wb_check_background_flush(struct bdi_writeback *wb) .sync_mode = WB_SYNC_NONE, .for_background = 1, .range_cyclic = 1, + .why = WB_STAT_BG_WRITEOUT, }; return wb_writeback(wb, &work); @@ -832,6 +855,7 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb) .sync_mode = WB_SYNC_NONE, .for_kupdate = 1, .range_cyclic = 1, + .why = WB_STAT_KUPDATE, }; return wb_writeback(wb, &work); @@ -910,6 +934,7 @@ int bdi_writeback_thread(void *data) */ del_timer(&wb->wakeup_timer); + bdi_writeback_stat_inc(bdi, WB_STAT_PERIODIC); pages_written = wb_do_writeback(wb, 0); trace_writeback_pages_written(pages_written); @@ -950,7 +975,7 @@ int bdi_writeback_thread(void *data) * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back * the whole world. */ -void wakeup_flusher_threads(long nr_pages) +void wakeup_flusher_threads(long nr_pages, enum wb_stats stat) { struct backing_dev_info *bdi; @@ -963,7 +988,7 @@ void wakeup_flusher_threads(long nr_pages) list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { if (!bdi_has_dirty_io(bdi)) continue; - __bdi_start_writeback(bdi, nr_pages, false); + __bdi_start_writeback(bdi, nr_pages, false, stat); } rcu_read_unlock(); } @@ -1192,6 +1217,7 @@ void writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr) .sync_mode = WB_SYNC_NONE, .done = &done, .nr_pages = nr, + .why = WB_STAT_SYNC, /* XXX: Not always correct */ }; WARN_ON(!rwsem_is_locked(&sb->s_umount)); @@ -1270,6 +1296,7 @@ void sync_inodes_sb(struct super_block *sb) .nr_pages = LONG_MAX, .range_cyclic = 0, .done = &done, + .why = WB_STAT_SYNC, }; WARN_ON(!rwsem_is_locked(&sb->s_umount)); diff --git a/fs/proc/Makefile b/fs/proc/Makefile index c1c7293..edac008 100644 --- a/fs/proc/Makefile +++ b/fs/proc/Makefile @@ -21,6 +21,7 @@ proc-y += uptime.o proc-y += version.o proc-y += softirqs.o proc-y += namespaces.o +proc-y += proc_writeback.o proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o proc-$(CONFIG_NET) += proc_net.o proc-$(CONFIG_PROC_KCORE) += kcore.o diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 7838e5c..43797ec 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -23,6 +23,7 @@ extern int proc_net_init(void); static inline int proc_net_init(void) { return 0; } #endif +extern void proc_writeback_init(void); struct vmalloc_info { unsigned long used; unsigned long largest_chunk; diff --git a/fs/proc/proc_writeback.c b/fs/proc/proc_writeback.c new file mode 100644 index 0000000..4614697 --- /dev/null +++ b/fs/proc/proc_writeback.c @@ -0,0 +1,145 @@ +/* + * linux/fs/proc/proc_writeback.c + */ +#include <linux/types.h> +#include <linux/errno.h> +#include <linux/init.h> +#include <linux/seq_file.h> +#include <linux/mm.h> +#include <linux/percpu.h> +#include <linux/writeback.h> +#include <linux/uaccess.h> +#include <linux/backing-dev.h> +#include "internal.h" + +struct writeback_stats *writeback_sys_stats; + +enum writeback_op { + WB_STATS_OP, +}; + +static const char *wb_stats_labels[WB_STAT_MAX] = { + [WB_STAT_BALANCE_DIRTY] = "call: balance_dirty_pages", + [WB_STAT_BG_WRITEOUT] = "call: background_writeout", + [WB_STAT_TRY_TO_FREE_PAGES] = "call: try_to_free_pages", + [WB_STAT_SYNC] = "call: sync", + [WB_STAT_KUPDATE] = "call: kupdate", + [WB_STAT_SHRINK_PAGE_LIST] = "call: shrink_page_list", + [WB_STAT_FDATAWRITE] = "call: fdatawrite", + [WB_STAT_LAPTOP_TIMER] = "call: laptop_periodic", + [WB_STAT_FREE_MORE_MEM] = "call: free_more_memory", + + [WB_STAT_PG_COUNT_BASE + WB_STAT_BALANCE_DIRTY] = + "page: balance_dirty_pages", + [WB_STAT_PG_COUNT_BASE + WB_STAT_BG_WRITEOUT] = + "page: background_writeout", + [WB_STAT_PG_COUNT_BASE + WB_STAT_TRY_TO_FREE_PAGES] = + "page: try_to_free_pages", + [WB_STAT_PG_COUNT_BASE + WB_STAT_SYNC] = + "page: sync", + [WB_STAT_PG_COUNT_BASE + WB_STAT_KUPDATE] = + "page: kupdate", + [WB_STAT_PG_COUNT_BASE + WB_STAT_SHRINK_PAGE_LIST] = + "page: shrink_page_list", + [WB_STAT_PG_COUNT_BASE + WB_STAT_FDATAWRITE] = + "page: fdatawrite", + [WB_STAT_PG_COUNT_BASE + WB_STAT_LAPTOP_TIMER] = + "page: laptop_periodic", + [WB_STAT_PG_COUNT_BASE + WB_STAT_FREE_MORE_MEM] = + "page: free_more_memory", + + [WB_STAT_PERIODIC] = "periodic writeback", + [WB_STAT_SINGLE_INODE_WAIT] = "single inode wait", + [WB_STAT_WRITEBACK_WB_WAIT] = "writeback_wb wait", + [WB_STAT_METADATA_PAGES_CLEANED] = "metadata pages cleaned", +}; + +static void writeback_stats_collect(struct writeback_stats *src, + struct writeback_stats *target) +{ + int cpu; + for_each_online_cpu(cpu) { + int stat; + struct writeback_stats *stats = per_cpu_ptr(src, cpu); + for (stat = 0; stat < WB_STAT_MAX; stat++) + target->stats[stat] += stats->stats[stat]; + } +} + +static size_t writeback_stats_to_str(struct writeback_stats *stats, + char *buf, size_t len) +{ + int bufsize = len - 1; + int i, printed = 0; + for (i = 0; i < WB_STAT_MAX; i++) { + const char *label = wb_stats_labels[i]; + if (label == NULL) + continue; + printed += snprintf(buf + printed, bufsize - printed, + "%-32s %10llu\n", label, stats->stats[i]); + if (printed >= bufsize) { + buf[len - 1] = '\n'; + return len; + } + } + + buf[printed - 1] = '\n'; + return printed; +} + +size_t writeback_stats_print(struct writeback_stats *stats, + char *buf, size_t len) +{ + struct writeback_stats total; + memset(&total, 0, sizeof(total)); + writeback_stats_collect(stats, &total); + return writeback_stats_to_str(&total, buf, len); +} + + +static int writeback_seq_show(struct seq_file *m, void *data) +{ + char *buf; + size_t size; + switch ((enum writeback_op)m->private) { + case WB_STATS_OP: + size = seq_get_buf(m, &buf); + if (size == 0) + return 0; + size = writeback_stats_print(writeback_sys_stats, buf, size); + seq_commit(m, size); + break; + default: + break; + } + + return 0; +} + +static int writeback_open(struct inode *inode, struct file *file) +{ + return single_open(file, writeback_seq_show, PDE(inode)->data); +} + +static const struct file_operations writeback_ops = { + .open = writeback_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + + +void __init proc_writeback_init(void) +{ + struct proc_dir_entry *base_dir; + base_dir = proc_mkdir("writeback", NULL); + if (base_dir == NULL) { + printk(KERN_ERR "Creating /proc/writeback/ failed"); + return; + } + + writeback_sys_stats = alloc_percpu(struct writeback_stats); + + proc_create_data("stats", S_IRUGO|S_IWUSR, base_dir, + &writeback_ops, (void *)WB_STATS_OP); +} diff --git a/fs/proc/root.c b/fs/proc/root.c index d6c3b41..a44e166 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -125,6 +125,7 @@ void __init proc_root_init(void) #endif proc_mkdir("bus", NULL); proc_sys_init(); + proc_writeback_init(); } static int proc_root_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat diff --git a/fs/sync.c b/fs/sync.c index c38ec16..f4c7085 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -98,7 +98,7 @@ static void sync_filesystems(int wait) */ SYSCALL_DEFINE0(sync) { - wakeup_flusher_threads(0); + wakeup_flusher_threads(0, WB_STAT_SYNC); sync_filesystems(0); sync_filesystems(1); if (unlikely(laptop_mode)) diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 96f4094..648b06f 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -70,6 +70,7 @@ struct backing_dev_info { char *name; struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS]; + struct writeback_stats *wb_stat; struct prop_local_percpu completions; int dirty_exceeded; @@ -100,7 +101,8 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent, int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev); void bdi_unregister(struct backing_dev_info *bdi); int bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int); -void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages); +void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, + enum wb_stats stat); void bdi_start_background_writeback(struct backing_dev_info *bdi); int bdi_writeback_thread(void *data); int bdi_has_dirty_io(struct backing_dev_info *bdi); @@ -181,6 +183,10 @@ static inline s64 bdi_stat_sum(struct backing_dev_info *bdi, return sum; } +void bdi_writeback_stat_inc(struct backing_dev_info *bdi, enum wb_stats stat); +void bdi_writeback_stat_add(struct backing_dev_info *bdi, enum wb_stats stat, + unsigned long value); + extern void bdi_writeout_inc(struct backing_dev_info *bdi); /* diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 17e7ccc..6fe2247 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -20,6 +20,54 @@ enum writeback_sync_modes { }; /* + * why this writeback was initiated + */ +enum wb_stats { + WB_STAT_BALANCE_DIRTY, + WB_STAT_BG_WRITEOUT, + WB_STAT_TRY_TO_FREE_PAGES, + WB_STAT_SYNC, + WB_STAT_KUPDATE, + WB_STAT_SHRINK_PAGE_LIST, + WB_STAT_FDATAWRITE, + WB_STAT_LAPTOP_TIMER, + WB_STAT_FREE_MORE_MEM, + + /* + * For event that you want both call count and pages associated + * with the call, add them above. + * + * [0 : WB_STAT_PAGES_COUNT_BASE - 1] stores number of calls + * pages count will be stored starting from WB_STAT_PAGES_COUNT_BASE + */ + WB_STAT_PG_COUNT_BASE, + + WB_STAT_PERIODIC = WB_STAT_PG_COUNT_BASE * 2, + WB_STAT_SINGLE_INODE_WAIT, + WB_STAT_WRITEBACK_WB_WAIT, + WB_STAT_METADATA_PAGES_CLEANED, + WB_STAT_MAX, +}; + +struct writeback_stats { + u64 stats[WB_STAT_MAX]; +}; + +extern struct writeback_stats *writeback_sys_stats; + +static inline struct writeback_stats *writeback_stats_alloc(void) +{ + return alloc_percpu(struct writeback_stats); +} + +static inline void writeback_stats_free(struct writeback_stats *stats) +{ + free_percpu(stats); +} + +size_t writeback_stats_print(struct writeback_stats *, char *buf, size_t); + +/* * A control structure which tells the writeback code what to do. These are * always on the stack, and hence need no locking. They are always initialised * in a manner such that unspecified fields are set to zero. @@ -65,7 +113,7 @@ void sync_inodes_sb(struct super_block *); void writeback_inodes_wb(struct bdi_writeback *wb, struct writeback_control *wbc); long wb_do_writeback(struct bdi_writeback *wb, int force_wait); -void wakeup_flusher_threads(long nr_pages); +void wakeup_flusher_threads(long nr_pages, enum wb_stats stat); /* writeback.h requires fs.h; it, too, is not included from here. */ static inline void wait_on_inode(struct inode *inode) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index f032e6e..e69e492 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -138,6 +138,13 @@ static inline void bdi_debug_unregister(struct backing_dev_info *bdi) } #endif +static ssize_t writeback_stats_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + return writeback_stats_print(bdi->wb_stat, buf, PAGE_SIZE); +} + static ssize_t read_ahead_kb_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) @@ -210,6 +217,7 @@ static struct device_attribute bdi_dev_attrs[] = { __ATTR_RW(read_ahead_kb), __ATTR_RW(min_ratio), __ATTR_RW(max_ratio), + __ATTR_RO(writeback_stats), __ATTR_NULL, }; @@ -652,11 +660,18 @@ int bdi_init(struct backing_dev_info *bdi) goto err; } + bdi->wb_stat = writeback_stats_alloc(); + if (bdi->wb_stat == NULL) { + err = -ENOMEM; + goto err; + } + bdi->dirty_exceeded = 0; err = prop_local_init_percpu(&bdi->completions); if (err) { err: + writeback_stats_free(bdi->wb_stat); while (i--) percpu_counter_destroy(&bdi->bdi_stat[i]); } @@ -688,6 +703,8 @@ void bdi_destroy(struct backing_dev_info *bdi) for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); + writeback_stats_free(bdi->wb_stat); + prop_local_destroy_percpu(&bdi->completions); } EXPORT_SYMBOL(bdi_destroy); @@ -830,3 +847,27 @@ out: return ret; } EXPORT_SYMBOL(wait_iff_congested); + +void bdi_writeback_stat_add(struct backing_dev_info *bdi, enum wb_stats stat, + unsigned long value) +{ + if (bdi) { + struct writeback_stats *stats = bdi->wb_stat; + + BUG_ON(stat >= WB_STAT_MAX); + preempt_disable(); + stats = per_cpu_ptr(stats, smp_processor_id()); + stats->stats[stat] += value; + if (likely(writeback_sys_stats)) { + stats = per_cpu_ptr(writeback_sys_stats, + smp_processor_id()); + stats->stats[stat] += value; + } + preempt_enable(); + } +} + +void bdi_writeback_stat_inc(struct backing_dev_info *bdi, enum wb_stats stat) +{ + bdi_writeback_stat_add(bdi, stat, 1); +} diff --git a/mm/filemap.c b/mm/filemap.c index a8251a8..6ac8ce2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -218,7 +218,14 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, if (!mapping_cap_writeback_dirty(mapping)) return 0; + bdi_writeback_stat_inc(mapping->backing_dev_info, + WB_STAT_FDATAWRITE); + ret = do_writepages(mapping, &wbc); + + bdi_writeback_stat_add(mapping->backing_dev_info, + WB_STAT_FDATAWRITE + WB_STAT_PG_COUNT_BASE, + LONG_MAX - wbc.nr_to_write); return ret; } diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 31f6988..5333968 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -559,7 +559,11 @@ static void balance_dirty_pages(struct address_space *mapping, */ trace_wbc_balance_dirty_start(&wbc, bdi); if (bdi_nr_reclaimable > bdi_thresh) { + bdi_writeback_stat_inc(bdi, WB_STAT_BALANCE_DIRTY); writeback_inodes_wb(&bdi->wb, &wbc); + bdi_writeback_stat_add(bdi, + WB_STAT_BALANCE_DIRTY + WB_STAT_PG_COUNT_BASE, + write_chunk - wbc.nr_to_write); pages_written += write_chunk - wbc.nr_to_write; trace_wbc_balance_dirty_written(&wbc, bdi); if (pages_written >= write_chunk) @@ -703,7 +707,8 @@ void laptop_mode_timer_fn(unsigned long data) * threshold */ if (bdi_has_dirty_io(&q->backing_dev_info)) - bdi_start_writeback(&q->backing_dev_info, nr_pages); + bdi_start_writeback(&q->backing_dev_info, nr_pages, + WB_STAT_LAPTOP_TIMER); } /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 5ed24b9..04cd8b1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -447,6 +447,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping, }; SetPageReclaim(page); + bdi_writeback_stat_inc(mapping->backing_dev_info, + WB_STAT_SHRINK_PAGE_LIST); res = mapping->a_ops->writepage(page, &wbc); if (res < 0) handle_write_error(mapping, page, res); @@ -454,6 +456,11 @@ static pageout_t pageout(struct page *page, struct address_space *mapping, ClearPageReclaim(page); return PAGE_ACTIVATE; } + if (res >= 0) { + bdi_writeback_stat_inc(mapping->backing_dev_info, + WB_STAT_SHRINK_PAGE_LIST + + WB_STAT_PG_COUNT_BASE); + } /* * Wait on writeback if requested to. This happens when @@ -2131,7 +2138,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, */ writeback_threshold = sc->nr_to_reclaim + sc->nr_to_reclaim / 2; if (total_scanned > writeback_threshold) { - wakeup_flusher_threads(laptop_mode ? 0 : total_scanned); + wakeup_flusher_threads(laptop_mode ? 0 : total_scanned, + WB_STAT_TRY_TO_FREE_PAGES); sc->may_writepage = 1; } -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Writeback tests 2011-07-15 23:41 ` Curt Wohlgemuth @ 2011-07-15 23:44 ` Christoph Hellwig 2011-07-19 16:46 ` Curt Wohlgemuth 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2011-07-15 23:44 UTC (permalink / raw) To: Curt Wohlgemuth Cc: Christoph Hellwig, Wu Fengguang, Jan Kara, Andrew Morton, Dave Chinner, Michael Rubin, linux-fsdevel On Fri, Jul 15, 2011 at 04:41:38PM -0700, Curt Wohlgemuth wrote: > $ cat /proc/writeback/stats > call: balance_dirty_pages 2253 > call: background_writeout 45 > call: try_to_free_pages 0 > call: sync 14 > call: kupdate 228 > call: shrink_page_list 0 > call: fdatawrite 6755 > call: laptop_periodic 0 > call: free_more_memory 0 > page: balance_dirty_pages 455830 > page: background_writeout 391568 > page: try_to_free_pages 0 > page: sync 133 > page: kupdate 1158779 > page: shrink_page_list 0 > page: fdatawrite 777320 > page: laptop_periodic 0 > page: free_more_memory 0 > periodic writeback 241 > single inode wait 0 > writeback_wb wait 7 > metadata pages cleaned 19473 A fair amount of the page stats should be provided by Wu and Jan's recent patches, shouldn't they? I don't think the calls stats are such a good idea. We can triviall get these stats by using perf - just place a probe at the function and then count it using perf record. But if would be good to get the remaining page stats in, as they seems pretty useful. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Writeback tests 2011-07-15 23:44 ` Christoph Hellwig @ 2011-07-19 16:46 ` Curt Wohlgemuth 2011-07-20 21:49 ` Curt Wohlgemuth 0 siblings, 1 reply; 7+ messages in thread From: Curt Wohlgemuth @ 2011-07-19 16:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Wu Fengguang, Jan Kara, Andrew Morton, Dave Chinner, Michael Rubin, linux-fsdevel On Fri, Jul 15, 2011 at 4:44 PM, Christoph Hellwig <hch@infradead.org> wrote: > On Fri, Jul 15, 2011 at 04:41:38PM -0700, Curt Wohlgemuth wrote: >> $ cat /proc/writeback/stats >> call: balance_dirty_pages 2253 >> call: background_writeout 45 >> call: try_to_free_pages 0 >> call: sync 14 >> call: kupdate 228 >> call: shrink_page_list 0 >> call: fdatawrite 6755 >> call: laptop_periodic 0 >> call: free_more_memory 0 >> page: balance_dirty_pages 455830 >> page: background_writeout 391568 >> page: try_to_free_pages 0 >> page: sync 133 >> page: kupdate 1158779 >> page: shrink_page_list 0 >> page: fdatawrite 777320 >> page: laptop_periodic 0 >> page: free_more_memory 0 >> periodic writeback 241 >> single inode wait 0 >> writeback_wb wait 7 >> metadata pages cleaned 19473 > > A fair amount of the page stats should be provided by Wu and Jan's > recent patches, shouldn't they? I don't think the calls stats > are such a good idea. We can triviall get these stats by using > perf - just place a probe at the function and then count it using > perf record. Well, sure, you can get all these stats via perf/ftrace/whatever, if you want to place the probes in the right spot, and count them up. I personally think that having total counts (that are monotonically increasing) always available is pretty convenient, including the call counts. But I agree, the page counts and the "metadata pages cleaned" are the most valuable. I looked through Jan's and Fengguang's recent patches for such stats, but didn't see anything -- though I could easily have missed them. > But if would be good to get the remaining page stats in, as they > seems pretty useful. I'll resend another patch without the call counts, see what people think. Thanks, Curt -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Writeback tests 2011-07-19 16:46 ` Curt Wohlgemuth @ 2011-07-20 21:49 ` Curt Wohlgemuth 2011-08-10 16:54 ` Christoph Hellwig 0 siblings, 1 reply; 7+ messages in thread From: Curt Wohlgemuth @ 2011-07-20 21:49 UTC (permalink / raw) To: Christoph Hellwig Cc: Wu Fengguang, Jan Kara, Andrew Morton, Dave Chinner, Michael Rubin, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 2769 bytes --] Hi Christoph: On Tue, Jul 19, 2011 at 9:46 AM, Curt Wohlgemuth <curtw@google.com> wrote: > On Fri, Jul 15, 2011 at 4:44 PM, Christoph Hellwig <hch@infradead.org> wrote: >> On Fri, Jul 15, 2011 at 04:41:38PM -0700, Curt Wohlgemuth wrote: >>> $ cat /proc/writeback/stats >>> call: balance_dirty_pages 2253 >>> call: background_writeout 45 >>> call: try_to_free_pages 0 >>> call: sync 14 >>> call: kupdate 228 >>> call: shrink_page_list 0 >>> call: fdatawrite 6755 >>> call: laptop_periodic 0 >>> call: free_more_memory 0 >>> page: balance_dirty_pages 455830 >>> page: background_writeout 391568 >>> page: try_to_free_pages 0 >>> page: sync 133 >>> page: kupdate 1158779 >>> page: shrink_page_list 0 >>> page: fdatawrite 777320 >>> page: laptop_periodic 0 >>> page: free_more_memory 0 >>> periodic writeback 241 >>> single inode wait 0 >>> writeback_wb wait 7 >>> metadata pages cleaned 19473 >> >> A fair amount of the page stats should be provided by Wu and Jan's >> recent patches, shouldn't they? I don't think the calls stats >> are such a good idea. We can triviall get these stats by using >> perf - just place a probe at the function and then count it using >> perf record. > > Well, sure, you can get all these stats via perf/ftrace/whatever, if > you want to place the probes in the right spot, and count them up. I > personally think that having total counts (that are monotonically > increasing) always available is pretty convenient, including the call > counts. But I agree, the page counts and the "metadata pages cleaned" > are the most valuable. > > I looked through Jan's and Fengguang's recent patches for such stats, > but didn't see anything -- though I could easily have missed them. > >> But if would be good to get the remaining page stats in, as they >> seems pretty useful. > > I'll resend another patch without the call counts, see what people think. Here's a version of the writeback stats patch without call counts, and is hence a bit simpler. I retained the 'why' field in a wb_writeback_work because it's simple :-) . This patch is based on Fengguang's tree at git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback.git , not on the kernel.org tree. Thanks, Curt [-- Attachment #2: 0001-writeback-Add-writeback-stats.patch --] [-- Type: text/x-patch, Size: 21755 bytes --] From 8e8eec4aa8c0a5b8ea3363be23da406581586cfa Mon Sep 17 00:00:00 2001 From: Curt Wohlgemuth <curtw@google.com> Date: Fri, 15 Jul 2011 15:50:13 -0700 Subject: [PATCH 1/2] writeback: Add writeback stats This creates a global file, /proc/writeback/stats which displays global data about the writeback subsystem: $ cat /proc/writeback/stats call: balance_dirty_pages 2253 call: background_writeout 45 call: try_to_free_pages 0 call: sync 14 call: kupdate 228 call: shrink_page_list 0 call: fdatawrite 6755 call: laptop_periodic 0 call: free_more_memory 0 page: balance_dirty_pages 455830 page: background_writeout 391568 page: try_to_free_pages 0 page: sync 133 page: kupdate 1158779 page: shrink_page_list 0 page: fdatawrite 777320 page: laptop_periodic 0 page: free_more_memory 0 periodic writeback 241 single inode wait 0 writeback_wb wait 7 metadata pages cleaned 19473 Per-BDI stats are available as well, in /sys/block/<device>/bdi/writeback_stats Signed-off-by: Curt Wohlgemuth <curtw@google.com> --- fs/buffer.c | 2 +- fs/fs-writeback.c | 37 ++++++++++-- fs/proc/Makefile | 1 + fs/proc/internal.h | 1 + fs/proc/proc_writeback.c | 145 +++++++++++++++++++++++++++++++++++++++++++ fs/proc/root.c | 1 + fs/sync.c | 2 +- include/linux/backing-dev.h | 8 ++- include/linux/writeback.h | 50 +++++++++++++++- mm/backing-dev.c | 41 ++++++++++++ mm/filemap.c | 7 ++ mm/page-writeback.c | 7 ++- mm/vmscan.c | 10 +++- 13 files changed, 301 insertions(+), 11 deletions(-) create mode 100644 fs/proc/proc_writeback.c diff --git a/fs/buffer.c b/fs/buffer.c index 1a80b04..cfca642 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -285,7 +285,7 @@ static void free_more_memory(void) struct zone *zone; int nid; - wakeup_flusher_threads(1024); + wakeup_flusher_threads(1024, WB_STAT_FREE_MORE_MEM); yield(); for_each_online_node(nid) { diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 0f015a0..472ec44 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -39,6 +39,7 @@ struct wb_writeback_work { unsigned int for_kupdate:1; unsigned int range_cyclic:1; unsigned int for_background:1; + int why; struct list_head list; /* pending work list */ struct completion *done; /* set if the caller waits */ @@ -113,7 +114,7 @@ static void bdi_queue_work(struct backing_dev_info *bdi, static void __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, - bool range_cyclic) + bool range_cyclic, enum wb_stats stat) { struct wb_writeback_work *work; @@ -133,6 +134,7 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, work->sync_mode = WB_SYNC_NONE; work->nr_pages = nr_pages; work->range_cyclic = range_cyclic; + work->why = stat; bdi_queue_work(bdi, work); } @@ -148,9 +150,10 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, * completion. Caller need not hold sb s_umount semaphore. * */ -void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages) +void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, + enum wb_stats stat) { - __bdi_start_writeback(bdi, nr_pages, true); + __bdi_start_writeback(bdi, nr_pages, true, stat); } /** @@ -374,6 +377,8 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc) /* * It's a data-integrity sync. We must wait. */ + bdi_writeback_stat_inc(inode->i_mapping->backing_dev_info, + WB_STAT_SINGLE_INODE_WAIT); inode_wait_for_writeback(inode); } @@ -501,6 +506,7 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb, { while (!list_empty(&wb->b_io)) { long pages_skipped; + int req, wrote; struct inode *inode = wb_inode(wb->b_io.prev); if (inode->i_sb != sb) { @@ -546,7 +552,9 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb, __iget(inode); pages_skipped = wbc->pages_skipped; + req = wbc->nr_to_write; writeback_single_inode(inode, wbc); + wrote = req - wbc->nr_to_write; if (wbc->pages_skipped != pages_skipped) { /* * writeback is not making progress due to locked @@ -554,6 +562,9 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb, */ redirty_tail(inode); } + if (inode->i_ino == 0) + bdi_writeback_stat_add(wb->bdi, + WB_STAT_METADATA_PAGES_CLEANED, wrote); spin_unlock(&inode->i_lock); spin_unlock(&inode_wb_list_lock); iput(inode); @@ -688,6 +699,8 @@ static long wb_writeback(struct bdi_writeback *wb, else write_chunk = LONG_MAX; + bdi_writeback_stat_inc(wb->bdi, work->why); + wbc.wb_start = jiffies; /* livelock avoidance */ for (;;) { /* @@ -724,6 +737,12 @@ static long wb_writeback(struct bdi_writeback *wb, writeback_inodes_wb(wb, &wbc); trace_wbc_writeback_written(&wbc, wb->bdi); + if (work->why < WB_STAT_PG_COUNT_BASE) { + bdi_writeback_stat_add(wb->bdi, + work->why + WB_STAT_PG_COUNT_BASE, + write_chunk - wbc.nr_to_write); + } + work->nr_pages -= write_chunk - wbc.nr_to_write; wrote += write_chunk - wbc.nr_to_write; @@ -752,6 +771,9 @@ static long wb_writeback(struct bdi_writeback *wb, inode = wb_inode(wb->b_more_io.prev); trace_wbc_writeback_wait(&wbc, wb->bdi); spin_lock(&inode->i_lock); + bdi_writeback_stat_inc( + inode->i_mapping->backing_dev_info, + WB_STAT_WRITEBACK_WB_WAIT); inode_wait_for_writeback(inode); spin_unlock(&inode->i_lock); } @@ -799,6 +821,7 @@ static long wb_check_background_flush(struct bdi_writeback *wb) .sync_mode = WB_SYNC_NONE, .for_background = 1, .range_cyclic = 1, + .why = WB_STAT_BG_WRITEOUT, }; return wb_writeback(wb, &work); @@ -832,6 +855,7 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb) .sync_mode = WB_SYNC_NONE, .for_kupdate = 1, .range_cyclic = 1, + .why = WB_STAT_KUPDATE, }; return wb_writeback(wb, &work); @@ -910,6 +934,7 @@ int bdi_writeback_thread(void *data) */ del_timer(&wb->wakeup_timer); + bdi_writeback_stat_inc(bdi, WB_STAT_PERIODIC); pages_written = wb_do_writeback(wb, 0); trace_writeback_pages_written(pages_written); @@ -950,7 +975,7 @@ int bdi_writeback_thread(void *data) * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back * the whole world. */ -void wakeup_flusher_threads(long nr_pages) +void wakeup_flusher_threads(long nr_pages, enum wb_stats stat) { struct backing_dev_info *bdi; @@ -963,7 +988,7 @@ void wakeup_flusher_threads(long nr_pages) list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { if (!bdi_has_dirty_io(bdi)) continue; - __bdi_start_writeback(bdi, nr_pages, false); + __bdi_start_writeback(bdi, nr_pages, false, stat); } rcu_read_unlock(); } @@ -1192,6 +1217,7 @@ void writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr) .sync_mode = WB_SYNC_NONE, .done = &done, .nr_pages = nr, + .why = WB_STAT_SYNC, /* XXX: Not always correct */ }; WARN_ON(!rwsem_is_locked(&sb->s_umount)); @@ -1270,6 +1296,7 @@ void sync_inodes_sb(struct super_block *sb) .nr_pages = LONG_MAX, .range_cyclic = 0, .done = &done, + .why = WB_STAT_SYNC, }; WARN_ON(!rwsem_is_locked(&sb->s_umount)); diff --git a/fs/proc/Makefile b/fs/proc/Makefile index c1c7293..edac008 100644 --- a/fs/proc/Makefile +++ b/fs/proc/Makefile @@ -21,6 +21,7 @@ proc-y += uptime.o proc-y += version.o proc-y += softirqs.o proc-y += namespaces.o +proc-y += proc_writeback.o proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o proc-$(CONFIG_NET) += proc_net.o proc-$(CONFIG_PROC_KCORE) += kcore.o diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 7838e5c..43797ec 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -23,6 +23,7 @@ extern int proc_net_init(void); static inline int proc_net_init(void) { return 0; } #endif +extern void proc_writeback_init(void); struct vmalloc_info { unsigned long used; unsigned long largest_chunk; diff --git a/fs/proc/proc_writeback.c b/fs/proc/proc_writeback.c new file mode 100644 index 0000000..4614697 --- /dev/null +++ b/fs/proc/proc_writeback.c @@ -0,0 +1,145 @@ +/* + * linux/fs/proc/proc_writeback.c + */ +#include <linux/types.h> +#include <linux/errno.h> +#include <linux/init.h> +#include <linux/seq_file.h> +#include <linux/mm.h> +#include <linux/percpu.h> +#include <linux/writeback.h> +#include <linux/uaccess.h> +#include <linux/backing-dev.h> +#include "internal.h" + +struct writeback_stats *writeback_sys_stats; + +enum writeback_op { + WB_STATS_OP, +}; + +static const char *wb_stats_labels[WB_STAT_MAX] = { + [WB_STAT_BALANCE_DIRTY] = "call: balance_dirty_pages", + [WB_STAT_BG_WRITEOUT] = "call: background_writeout", + [WB_STAT_TRY_TO_FREE_PAGES] = "call: try_to_free_pages", + [WB_STAT_SYNC] = "call: sync", + [WB_STAT_KUPDATE] = "call: kupdate", + [WB_STAT_SHRINK_PAGE_LIST] = "call: shrink_page_list", + [WB_STAT_FDATAWRITE] = "call: fdatawrite", + [WB_STAT_LAPTOP_TIMER] = "call: laptop_periodic", + [WB_STAT_FREE_MORE_MEM] = "call: free_more_memory", + + [WB_STAT_PG_COUNT_BASE + WB_STAT_BALANCE_DIRTY] = + "page: balance_dirty_pages", + [WB_STAT_PG_COUNT_BASE + WB_STAT_BG_WRITEOUT] = + "page: background_writeout", + [WB_STAT_PG_COUNT_BASE + WB_STAT_TRY_TO_FREE_PAGES] = + "page: try_to_free_pages", + [WB_STAT_PG_COUNT_BASE + WB_STAT_SYNC] = + "page: sync", + [WB_STAT_PG_COUNT_BASE + WB_STAT_KUPDATE] = + "page: kupdate", + [WB_STAT_PG_COUNT_BASE + WB_STAT_SHRINK_PAGE_LIST] = + "page: shrink_page_list", + [WB_STAT_PG_COUNT_BASE + WB_STAT_FDATAWRITE] = + "page: fdatawrite", + [WB_STAT_PG_COUNT_BASE + WB_STAT_LAPTOP_TIMER] = + "page: laptop_periodic", + [WB_STAT_PG_COUNT_BASE + WB_STAT_FREE_MORE_MEM] = + "page: free_more_memory", + + [WB_STAT_PERIODIC] = "periodic writeback", + [WB_STAT_SINGLE_INODE_WAIT] = "single inode wait", + [WB_STAT_WRITEBACK_WB_WAIT] = "writeback_wb wait", + [WB_STAT_METADATA_PAGES_CLEANED] = "metadata pages cleaned", +}; + +static void writeback_stats_collect(struct writeback_stats *src, + struct writeback_stats *target) +{ + int cpu; + for_each_online_cpu(cpu) { + int stat; + struct writeback_stats *stats = per_cpu_ptr(src, cpu); + for (stat = 0; stat < WB_STAT_MAX; stat++) + target->stats[stat] += stats->stats[stat]; + } +} + +static size_t writeback_stats_to_str(struct writeback_stats *stats, + char *buf, size_t len) +{ + int bufsize = len - 1; + int i, printed = 0; + for (i = 0; i < WB_STAT_MAX; i++) { + const char *label = wb_stats_labels[i]; + if (label == NULL) + continue; + printed += snprintf(buf + printed, bufsize - printed, + "%-32s %10llu\n", label, stats->stats[i]); + if (printed >= bufsize) { + buf[len - 1] = '\n'; + return len; + } + } + + buf[printed - 1] = '\n'; + return printed; +} + +size_t writeback_stats_print(struct writeback_stats *stats, + char *buf, size_t len) +{ + struct writeback_stats total; + memset(&total, 0, sizeof(total)); + writeback_stats_collect(stats, &total); + return writeback_stats_to_str(&total, buf, len); +} + + +static int writeback_seq_show(struct seq_file *m, void *data) +{ + char *buf; + size_t size; + switch ((enum writeback_op)m->private) { + case WB_STATS_OP: + size = seq_get_buf(m, &buf); + if (size == 0) + return 0; + size = writeback_stats_print(writeback_sys_stats, buf, size); + seq_commit(m, size); + break; + default: + break; + } + + return 0; +} + +static int writeback_open(struct inode *inode, struct file *file) +{ + return single_open(file, writeback_seq_show, PDE(inode)->data); +} + +static const struct file_operations writeback_ops = { + .open = writeback_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + + +void __init proc_writeback_init(void) +{ + struct proc_dir_entry *base_dir; + base_dir = proc_mkdir("writeback", NULL); + if (base_dir == NULL) { + printk(KERN_ERR "Creating /proc/writeback/ failed"); + return; + } + + writeback_sys_stats = alloc_percpu(struct writeback_stats); + + proc_create_data("stats", S_IRUGO|S_IWUSR, base_dir, + &writeback_ops, (void *)WB_STATS_OP); +} diff --git a/fs/proc/root.c b/fs/proc/root.c index d6c3b41..a44e166 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -125,6 +125,7 @@ void __init proc_root_init(void) #endif proc_mkdir("bus", NULL); proc_sys_init(); + proc_writeback_init(); } static int proc_root_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat diff --git a/fs/sync.c b/fs/sync.c index c38ec16..f4c7085 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -98,7 +98,7 @@ static void sync_filesystems(int wait) */ SYSCALL_DEFINE0(sync) { - wakeup_flusher_threads(0); + wakeup_flusher_threads(0, WB_STAT_SYNC); sync_filesystems(0); sync_filesystems(1); if (unlikely(laptop_mode)) diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 96f4094..648b06f 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -70,6 +70,7 @@ struct backing_dev_info { char *name; struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS]; + struct writeback_stats *wb_stat; struct prop_local_percpu completions; int dirty_exceeded; @@ -100,7 +101,8 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent, int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev); void bdi_unregister(struct backing_dev_info *bdi); int bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int); -void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages); +void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages, + enum wb_stats stat); void bdi_start_background_writeback(struct backing_dev_info *bdi); int bdi_writeback_thread(void *data); int bdi_has_dirty_io(struct backing_dev_info *bdi); @@ -181,6 +183,10 @@ static inline s64 bdi_stat_sum(struct backing_dev_info *bdi, return sum; } +void bdi_writeback_stat_inc(struct backing_dev_info *bdi, enum wb_stats stat); +void bdi_writeback_stat_add(struct backing_dev_info *bdi, enum wb_stats stat, + unsigned long value); + extern void bdi_writeout_inc(struct backing_dev_info *bdi); /* diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 17e7ccc..6fe2247 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -20,6 +20,54 @@ enum writeback_sync_modes { }; /* + * why this writeback was initiated + */ +enum wb_stats { + WB_STAT_BALANCE_DIRTY, + WB_STAT_BG_WRITEOUT, + WB_STAT_TRY_TO_FREE_PAGES, + WB_STAT_SYNC, + WB_STAT_KUPDATE, + WB_STAT_SHRINK_PAGE_LIST, + WB_STAT_FDATAWRITE, + WB_STAT_LAPTOP_TIMER, + WB_STAT_FREE_MORE_MEM, + + /* + * For event that you want both call count and pages associated + * with the call, add them above. + * + * [0 : WB_STAT_PAGES_COUNT_BASE - 1] stores number of calls + * pages count will be stored starting from WB_STAT_PAGES_COUNT_BASE + */ + WB_STAT_PG_COUNT_BASE, + + WB_STAT_PERIODIC = WB_STAT_PG_COUNT_BASE * 2, + WB_STAT_SINGLE_INODE_WAIT, + WB_STAT_WRITEBACK_WB_WAIT, + WB_STAT_METADATA_PAGES_CLEANED, + WB_STAT_MAX, +}; + +struct writeback_stats { + u64 stats[WB_STAT_MAX]; +}; + +extern struct writeback_stats *writeback_sys_stats; + +static inline struct writeback_stats *writeback_stats_alloc(void) +{ + return alloc_percpu(struct writeback_stats); +} + +static inline void writeback_stats_free(struct writeback_stats *stats) +{ + free_percpu(stats); +} + +size_t writeback_stats_print(struct writeback_stats *, char *buf, size_t); + +/* * A control structure which tells the writeback code what to do. These are * always on the stack, and hence need no locking. They are always initialised * in a manner such that unspecified fields are set to zero. @@ -65,7 +113,7 @@ void sync_inodes_sb(struct super_block *); void writeback_inodes_wb(struct bdi_writeback *wb, struct writeback_control *wbc); long wb_do_writeback(struct bdi_writeback *wb, int force_wait); -void wakeup_flusher_threads(long nr_pages); +void wakeup_flusher_threads(long nr_pages, enum wb_stats stat); /* writeback.h requires fs.h; it, too, is not included from here. */ static inline void wait_on_inode(struct inode *inode) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index f032e6e..e69e492 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -138,6 +138,13 @@ static inline void bdi_debug_unregister(struct backing_dev_info *bdi) } #endif +static ssize_t writeback_stats_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + return writeback_stats_print(bdi->wb_stat, buf, PAGE_SIZE); +} + static ssize_t read_ahead_kb_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) @@ -210,6 +217,7 @@ static struct device_attribute bdi_dev_attrs[] = { __ATTR_RW(read_ahead_kb), __ATTR_RW(min_ratio), __ATTR_RW(max_ratio), + __ATTR_RO(writeback_stats), __ATTR_NULL, }; @@ -652,11 +660,18 @@ int bdi_init(struct backing_dev_info *bdi) goto err; } + bdi->wb_stat = writeback_stats_alloc(); + if (bdi->wb_stat == NULL) { + err = -ENOMEM; + goto err; + } + bdi->dirty_exceeded = 0; err = prop_local_init_percpu(&bdi->completions); if (err) { err: + writeback_stats_free(bdi->wb_stat); while (i--) percpu_counter_destroy(&bdi->bdi_stat[i]); } @@ -688,6 +703,8 @@ void bdi_destroy(struct backing_dev_info *bdi) for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); + writeback_stats_free(bdi->wb_stat); + prop_local_destroy_percpu(&bdi->completions); } EXPORT_SYMBOL(bdi_destroy); @@ -830,3 +847,27 @@ out: return ret; } EXPORT_SYMBOL(wait_iff_congested); + +void bdi_writeback_stat_add(struct backing_dev_info *bdi, enum wb_stats stat, + unsigned long value) +{ + if (bdi) { + struct writeback_stats *stats = bdi->wb_stat; + + BUG_ON(stat >= WB_STAT_MAX); + preempt_disable(); + stats = per_cpu_ptr(stats, smp_processor_id()); + stats->stats[stat] += value; + if (likely(writeback_sys_stats)) { + stats = per_cpu_ptr(writeback_sys_stats, + smp_processor_id()); + stats->stats[stat] += value; + } + preempt_enable(); + } +} + +void bdi_writeback_stat_inc(struct backing_dev_info *bdi, enum wb_stats stat) +{ + bdi_writeback_stat_add(bdi, stat, 1); +} diff --git a/mm/filemap.c b/mm/filemap.c index a8251a8..6ac8ce2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -218,7 +218,14 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, if (!mapping_cap_writeback_dirty(mapping)) return 0; + bdi_writeback_stat_inc(mapping->backing_dev_info, + WB_STAT_FDATAWRITE); + ret = do_writepages(mapping, &wbc); + + bdi_writeback_stat_add(mapping->backing_dev_info, + WB_STAT_FDATAWRITE + WB_STAT_PG_COUNT_BASE, + LONG_MAX - wbc.nr_to_write); return ret; } diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 31f6988..5333968 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -559,7 +559,11 @@ static void balance_dirty_pages(struct address_space *mapping, */ trace_wbc_balance_dirty_start(&wbc, bdi); if (bdi_nr_reclaimable > bdi_thresh) { + bdi_writeback_stat_inc(bdi, WB_STAT_BALANCE_DIRTY); writeback_inodes_wb(&bdi->wb, &wbc); + bdi_writeback_stat_add(bdi, + WB_STAT_BALANCE_DIRTY + WB_STAT_PG_COUNT_BASE, + write_chunk - wbc.nr_to_write); pages_written += write_chunk - wbc.nr_to_write; trace_wbc_balance_dirty_written(&wbc, bdi); if (pages_written >= write_chunk) @@ -703,7 +707,8 @@ void laptop_mode_timer_fn(unsigned long data) * threshold */ if (bdi_has_dirty_io(&q->backing_dev_info)) - bdi_start_writeback(&q->backing_dev_info, nr_pages); + bdi_start_writeback(&q->backing_dev_info, nr_pages, + WB_STAT_LAPTOP_TIMER); } /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 5ed24b9..04cd8b1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -447,6 +447,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping, }; SetPageReclaim(page); + bdi_writeback_stat_inc(mapping->backing_dev_info, + WB_STAT_SHRINK_PAGE_LIST); res = mapping->a_ops->writepage(page, &wbc); if (res < 0) handle_write_error(mapping, page, res); @@ -454,6 +456,11 @@ static pageout_t pageout(struct page *page, struct address_space *mapping, ClearPageReclaim(page); return PAGE_ACTIVATE; } + if (res >= 0) { + bdi_writeback_stat_inc(mapping->backing_dev_info, + WB_STAT_SHRINK_PAGE_LIST + + WB_STAT_PG_COUNT_BASE); + } /* * Wait on writeback if requested to. This happens when @@ -2131,7 +2138,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, */ writeback_threshold = sc->nr_to_reclaim + sc->nr_to_reclaim / 2; if (total_scanned > writeback_threshold) { - wakeup_flusher_threads(laptop_mode ? 0 : total_scanned); + wakeup_flusher_threads(laptop_mode ? 0 : total_scanned, + WB_STAT_TRY_TO_FREE_PAGES); sc->may_writepage = 1; } -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Writeback tests 2011-07-20 21:49 ` Curt Wohlgemuth @ 2011-08-10 16:54 ` Christoph Hellwig 0 siblings, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2011-08-10 16:54 UTC (permalink / raw) To: Curt Wohlgemuth Cc: Christoph Hellwig, Wu Fengguang, Jan Kara, Andrew Morton, Dave Chinner, Michael Rubin, linux-fsdevel Looks like this mostly got lost in the noise. Can you resend it with a proper subject to linux-mm and fsdevel outside of this threads? A few comments that could be addressed below: > @@ -39,6 +39,7 @@ struct wb_writeback_work { > unsigned int for_kupdate:1; > unsigned int range_cyclic:1; > unsigned int for_background:1; > + int why; Needs an explanation what it really is. Also maybe reason is a better name for the variable? > @@ -554,6 +562,9 @@ static int writeback_sb_inodes(struct super_block *sb, struct bdi_writeback *wb, > */ > redirty_tail(inode); > } > + if (inode->i_ino == 0) > + bdi_writeback_stat_add(wb->bdi, > + WB_STAT_METADATA_PAGES_CLEANED, wrote); inode->i_ino doesn't nessecarily imply it's metdata. A filesystem might simply not use the vfs inode number, or use a too large data type that gets truncated. But even when you check for inodes on the block device filesystem people still can use those for data I/O. > @@ -724,6 +737,12 @@ static long wb_writeback(struct bdi_writeback *wb, > writeback_inodes_wb(wb, &wbc); > trace_wbc_writeback_written(&wbc, wb->bdi); > > + if (work->why < WB_STAT_PG_COUNT_BASE) { > + bdi_writeback_stat_add(wb->bdi, > + work->why + WB_STAT_PG_COUNT_BASE, > + write_chunk - wbc.nr_to_write); > + } > + Can you explain the WB_STAT_PG_COUNT_BASE magic a bit better? Maybe hiding it in a helper would be useful, which would also get the comments. > work->nr_pages -= write_chunk - wbc.nr_to_write; > wrote += write_chunk - wbc.nr_to_write; Given how often we do this calculation it would be good to pull it into a local variable. > @@ -1192,6 +1217,7 @@ void writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr) > .sync_mode = WB_SYNC_NONE, > .done = &done, > .nr_pages = nr, > + .why = WB_STAT_SYNC, /* XXX: Not always correct */ > }; > > WARN_ON(!rwsem_is_locked(&sb->s_umount)); > @@ -1270,6 +1296,7 @@ void sync_inodes_sb(struct super_block *sb) > .nr_pages = LONG_MAX, > .range_cyclic = 0, > .done = &done, > + .why = WB_STAT_SYNC, > }; Indeed. Either you want to pass the argument from the caller, or use writeback_inodes_sb/sync_inodes_sb as the stat name and thus make it even more fine-grained. > diff --git a/fs/proc/proc_writeback.c b/fs/proc/proc_writeback.c > new file mode 100644 > index 0000000..4614697 > --- /dev/null > +++ b/fs/proc/proc_writeback.c I think the details of the stats file shouldn't be in fs/proc/ but in mm/ or fs/ where they are accumulated. Last but not least you really should pass the why/reason argument to the VM tracepoints. Maybe just passing the reason down and adding it to the tracepoints could be patch 1/2, with the second one beeing the actual statistics counters. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-08-10 16:55 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-14 22:52 Writeback tests Curt Wohlgemuth 2011-07-15 15:33 ` Christoph Hellwig 2011-07-15 23:41 ` Curt Wohlgemuth 2011-07-15 23:44 ` Christoph Hellwig 2011-07-19 16:46 ` Curt Wohlgemuth 2011-07-20 21:49 ` Curt Wohlgemuth 2011-08-10 16:54 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).