* [RFC PATCH 0/2] add some stats in es shrinker and do some cleanups in tracepoint @ 2014-01-14 8:06 Zheng Liu 2014-01-14 8:06 ` [RFC PATCH 1/2] ext4: improve extents status tree trace point Zheng Liu 2014-01-14 8:06 ` [RFC PATCH 2/2] ext4: track extent status tree shrinker delay statictics Zheng Liu 0 siblings, 2 replies; 8+ messages in thread From: Zheng Liu @ 2014-01-14 8:06 UTC (permalink / raw) To: linux-ext4; +Cc: Zheng Liu, Theodore Ts'o, Andreas Dilger, Jan Kara Hi all, This patch set contains two patches. The first one is a trival patch, which does some cleanups for the trace point of extent status tree shrinker. The second one is to add some statistics for extent status tree shrinker in order that we want to collect some more details when we encounter a stall. As discussion with Ted and other folks, we need to add some counters to reflect the behaviour of the shrinker, and later we can use these counters to measure the performance for the shrinker improvements. Meanwhile these information can help us to reproduce the problem when the user reports that their machine is stall caused by the shrinker. Any comment or suggestion are always welcome. Thanks, - Zheng Zheng Liu (2): ext4: improve extents status tree trace point ext4: track extent status tree shrinker delay statictics fs/ext4/ext4.h | 4 +- fs/ext4/extents_status.c | 177 +++++++++++++++++++++++++++++++++++++++---- fs/ext4/extents_status.h | 10 ++- fs/ext4/super.c | 17 ++--- include/trace/events/ext4.h | 59 +++++++++++++-- 5 files changed, 233 insertions(+), 34 deletions(-) -- 1.7.9.7 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH 1/2] ext4: improve extents status tree trace point 2014-01-14 8:06 [RFC PATCH 0/2] add some stats in es shrinker and do some cleanups in tracepoint Zheng Liu @ 2014-01-14 8:06 ` Zheng Liu 2014-01-14 13:14 ` Jan Kara 2014-01-14 8:06 ` [RFC PATCH 2/2] ext4: track extent status tree shrinker delay statictics Zheng Liu 1 sibling, 1 reply; 8+ messages in thread From: Zheng Liu @ 2014-01-14 8:06 UTC (permalink / raw) To: linux-ext4; +Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Zheng Liu From: Zheng Liu <wenqing.lz@taobao.com> This commit improves the trace point of extents status tree. We rename trace_ext4_es_shrink_enter in ext4_es_count() because it is also used in ext4_es_scan() and we don't identify them from the result. Further this commit fixes a variable name in trace point in order to keep consistency with others. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> --- fs/ext4/extents_status.c | 6 +++--- include/trace/events/ext4.h | 28 ++++++++++++++++++++-------- 2 files changed, 23 insertions(+), 11 deletions(-) diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index 3981ff7..eb7ae61 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -1017,7 +1017,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink, sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker); nr = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); - trace_ext4_es_shrink_enter(sbi->s_sb, sc->nr_to_scan, nr); + trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr); return nr; } @@ -1030,14 +1030,14 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, int ret, nr_shrunk; ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); - trace_ext4_es_shrink_enter(sbi->s_sb, nr_to_scan, ret); + trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret); if (!nr_to_scan) return ret; nr_shrunk = __ext4_es_shrink(sbi, nr_to_scan, NULL); - trace_ext4_es_shrink_exit(sbi->s_sb, nr_shrunk, ret); + trace_ext4_es_shrink_scan_exit(sbi->s_sb, nr_shrunk, ret); return nr_shrunk; } diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 197d312..35d28f8 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2366,7 +2366,7 @@ TRACE_EVENT(ext4_es_lookup_extent_exit, show_extent_status(__entry->found ? __entry->status : 0)) ); -TRACE_EVENT(ext4_es_shrink_enter, +DECLARE_EVENT_CLASS(ext4__es_shrink_enter, TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), TP_ARGS(sb, nr_to_scan, cache_cnt), @@ -2388,26 +2388,38 @@ TRACE_EVENT(ext4_es_shrink_enter, __entry->nr_to_scan, __entry->cache_cnt) ); -TRACE_EVENT(ext4_es_shrink_exit, - TP_PROTO(struct super_block *sb, int shrunk_nr, int cache_cnt), +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_count, + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), + + TP_ARGS(sb, nr_to_scan, cache_cnt) +); + +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_scan_enter, + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), + + TP_ARGS(sb, nr_to_scan, cache_cnt) +); + +TRACE_EVENT(ext4_es_shrink_scan_exit, + TP_PROTO(struct super_block *sb, int nr_shrunk, int cache_cnt), - TP_ARGS(sb, shrunk_nr, cache_cnt), + TP_ARGS(sb, nr_shrunk, cache_cnt), TP_STRUCT__entry( __field( dev_t, dev ) - __field( int, shrunk_nr ) + __field( int, nr_shrunk ) __field( int, cache_cnt ) ), TP_fast_assign( __entry->dev = sb->s_dev; - __entry->shrunk_nr = shrunk_nr; + __entry->nr_shrunk = nr_shrunk; __entry->cache_cnt = cache_cnt; ), - TP_printk("dev %d,%d shrunk_nr %d cache_cnt %d", + TP_printk("dev %d,%d nr_shrunk %d cache_cnt %d", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->shrunk_nr, __entry->cache_cnt) + __entry->nr_shrunk, __entry->cache_cnt) ); #endif /* _TRACE_EXT4_H */ -- 1.7.9.7 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/2] ext4: improve extents status tree trace point 2014-01-14 8:06 ` [RFC PATCH 1/2] ext4: improve extents status tree trace point Zheng Liu @ 2014-01-14 13:14 ` Jan Kara 0 siblings, 0 replies; 8+ messages in thread From: Jan Kara @ 2014-01-14 13:14 UTC (permalink / raw) To: Zheng Liu Cc: linux-ext4, Theodore Ts'o, Andreas Dilger, Jan Kara, Zheng Liu On Tue 14-01-14 16:06:40, Zheng Liu wrote: > From: Zheng Liu <wenqing.lz@taobao.com> > > This commit improves the trace point of extents status tree. We rename > trace_ext4_es_shrink_enter in ext4_es_count() because it is also used > in ext4_es_scan() and we don't identify them from the result. > > Further this commit fixes a variable name in trace point in order to > keep consistency with others. The patch looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > > Cc: "Theodore Ts'o" <tytso@mit.edu> > Cc: Andreas Dilger <adilger.kernel@dilger.ca> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> > --- > fs/ext4/extents_status.c | 6 +++--- > include/trace/events/ext4.h | 28 ++++++++++++++++++++-------- > 2 files changed, 23 insertions(+), 11 deletions(-) > > diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c > index 3981ff7..eb7ae61 100644 > --- a/fs/ext4/extents_status.c > +++ b/fs/ext4/extents_status.c > @@ -1017,7 +1017,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink, > > sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker); > nr = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); > - trace_ext4_es_shrink_enter(sbi->s_sb, sc->nr_to_scan, nr); > + trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr); > return nr; > } > > @@ -1030,14 +1030,14 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, > int ret, nr_shrunk; > > ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); > - trace_ext4_es_shrink_enter(sbi->s_sb, nr_to_scan, ret); > + trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret); > > if (!nr_to_scan) > return ret; > > nr_shrunk = __ext4_es_shrink(sbi, nr_to_scan, NULL); > > - trace_ext4_es_shrink_exit(sbi->s_sb, nr_shrunk, ret); > + trace_ext4_es_shrink_scan_exit(sbi->s_sb, nr_shrunk, ret); > return nr_shrunk; > } > > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h > index 197d312..35d28f8 100644 > --- a/include/trace/events/ext4.h > +++ b/include/trace/events/ext4.h > @@ -2366,7 +2366,7 @@ TRACE_EVENT(ext4_es_lookup_extent_exit, > show_extent_status(__entry->found ? __entry->status : 0)) > ); > > -TRACE_EVENT(ext4_es_shrink_enter, > +DECLARE_EVENT_CLASS(ext4__es_shrink_enter, > TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > > TP_ARGS(sb, nr_to_scan, cache_cnt), > @@ -2388,26 +2388,38 @@ TRACE_EVENT(ext4_es_shrink_enter, > __entry->nr_to_scan, __entry->cache_cnt) > ); > > -TRACE_EVENT(ext4_es_shrink_exit, > - TP_PROTO(struct super_block *sb, int shrunk_nr, int cache_cnt), > +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_count, > + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > + > + TP_ARGS(sb, nr_to_scan, cache_cnt) > +); > + > +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_scan_enter, > + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > + > + TP_ARGS(sb, nr_to_scan, cache_cnt) > +); > + > +TRACE_EVENT(ext4_es_shrink_scan_exit, > + TP_PROTO(struct super_block *sb, int nr_shrunk, int cache_cnt), > > - TP_ARGS(sb, shrunk_nr, cache_cnt), > + TP_ARGS(sb, nr_shrunk, cache_cnt), > > TP_STRUCT__entry( > __field( dev_t, dev ) > - __field( int, shrunk_nr ) > + __field( int, nr_shrunk ) > __field( int, cache_cnt ) > ), > > TP_fast_assign( > __entry->dev = sb->s_dev; > - __entry->shrunk_nr = shrunk_nr; > + __entry->nr_shrunk = nr_shrunk; > __entry->cache_cnt = cache_cnt; > ), > > - TP_printk("dev %d,%d shrunk_nr %d cache_cnt %d", > + TP_printk("dev %d,%d nr_shrunk %d cache_cnt %d", > MAJOR(__entry->dev), MINOR(__entry->dev), > - __entry->shrunk_nr, __entry->cache_cnt) > + __entry->nr_shrunk, __entry->cache_cnt) > ); > > #endif /* _TRACE_EXT4_H */ > -- > 1.7.9.7 > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH 2/2] ext4: track extent status tree shrinker delay statictics 2014-01-14 8:06 [RFC PATCH 0/2] add some stats in es shrinker and do some cleanups in tracepoint Zheng Liu 2014-01-14 8:06 ` [RFC PATCH 1/2] ext4: improve extents status tree trace point Zheng Liu @ 2014-01-14 8:06 ` Zheng Liu 2014-01-14 13:24 ` Jan Kara 1 sibling, 1 reply; 8+ messages in thread From: Zheng Liu @ 2014-01-14 8:06 UTC (permalink / raw) To: linux-ext4; +Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Zheng Liu From: Zheng Liu <wenqing.lz@taobao.com> This commit adds some statictics in extent status tree shrinker. The purpose to add these is that we want to collect more details when we encounter a stall caused by extent status tree shrinker. Here we count the following statictics: stats: the number of all objects on all extent status trees the number of reclaimable objects on lru list the last sorted interval the number of inodes on lru list average: scan time for shrinking some objects the number of shrunk objects maximum: the inode that has max nr. of objects on lru list The output looks like below: $ cat /proc/fs/ext4/sda1/es_shrinker_info stats: 28228 objects 6341 reclaimable objects 586ms last sorted interval 250 inodes on lru list average: 153us scan time 128 shrunk objects maximum: 255 inode (255 objects, 198 reclaimable) If the lru list has never been sorted, the following line will not be printed: 586ms last sorted interval If there is an empty lru list, the following lines also will not be printed: 250 inodes on lru list ... maximum: 255 inode (255 objects, 198 reclaimable) Meanwhile in this commit a new trace point is defined to print some details in __ext4_es_shrink(). Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> --- fs/ext4/ext4.h | 4 +- fs/ext4/extents_status.c | 171 ++++++++++++++++++++++++++++++++++++++++--- fs/ext4/extents_status.h | 10 ++- fs/ext4/super.c | 17 ++--- include/trace/events/ext4.h | 31 ++++++++ 5 files changed, 210 insertions(+), 23 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index ece5556..23ce6b7 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -885,6 +885,7 @@ struct ext4_inode_info { struct ext4_es_tree i_es_tree; rwlock_t i_es_lock; struct list_head i_es_lru; + unsigned int i_es_all_nr; /* protected by i_es_lock */ unsigned int i_es_lru_nr; /* protected by i_es_lock */ unsigned long i_touch_when; /* jiffies of last accessing */ @@ -1322,8 +1323,7 @@ struct ext4_sb_info { /* Reclaim extents from extent status tree */ struct shrinker s_es_shrinker; struct list_head s_es_lru; - unsigned long s_es_last_sorted; - struct percpu_counter s_extent_cache_cnt; + struct ext4_es_stats s_es_stats; spinlock_t s_es_lru_lock ____cacheline_aligned_in_smp; /* Ratelimit ext4 messages. */ diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index eb7ae61..5cb25c5 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -11,6 +11,8 @@ */ #include <linux/rbtree.h> #include <linux/list_sort.h> +#include <linux/proc_fs.h> +#include <linux/seq_file.h> #include "ext4.h" #include "extents_status.h" @@ -313,19 +315,27 @@ ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len, */ if (!ext4_es_is_delayed(es)) { EXT4_I(inode)->i_es_lru_nr++; - percpu_counter_inc(&EXT4_SB(inode->i_sb)->s_extent_cache_cnt); + percpu_counter_inc(&EXT4_SB(inode->i_sb)-> + s_es_stats.es_stats_lru_cnt); } + EXT4_I(inode)->i_es_all_nr++; + percpu_counter_inc(&EXT4_SB(inode->i_sb)->s_es_stats.es_stats_all_cnt); + return es; } static void ext4_es_free_extent(struct inode *inode, struct extent_status *es) { + EXT4_I(inode)->i_es_all_nr--; + percpu_counter_dec(&EXT4_SB(inode->i_sb)->s_es_stats.es_stats_all_cnt); + /* Decrease the lru counter when this es is not delayed */ if (!ext4_es_is_delayed(es)) { BUG_ON(EXT4_I(inode)->i_es_lru_nr == 0); EXT4_I(inode)->i_es_lru_nr--; - percpu_counter_dec(&EXT4_SB(inode->i_sb)->s_extent_cache_cnt); + percpu_counter_dec(&EXT4_SB(inode->i_sb)-> + s_es_stats.es_stats_lru_cnt); } kmem_cache_free(ext4_es_cachep, es); @@ -929,11 +939,16 @@ static int __ext4_es_shrink(struct ext4_sb_info *sbi, int nr_to_scan, struct ext4_inode_info *locked_ei) { struct ext4_inode_info *ei; + struct ext4_es_stats *es_stats; struct list_head *cur, *tmp; LIST_HEAD(skipped); + ktime_t start_time; + u64 scan_time; int nr_shrunk = 0; int retried = 0, skip_precached = 1, nr_skipped = 0; + es_stats = &sbi->s_es_stats; + start_time = ktime_get(); spin_lock(&sbi->s_es_lru_lock); retry: @@ -944,7 +959,8 @@ retry: * If we have already reclaimed all extents from extent * status tree, just stop the loop immediately. */ - if (percpu_counter_read_positive(&sbi->s_extent_cache_cnt) == 0) + if (percpu_counter_read_positive( + &es_stats->es_stats_lru_cnt) == 0) break; ei = list_entry(cur, struct ext4_inode_info, i_es_lru); @@ -954,7 +970,7 @@ retry: * time. Normally we try hard to avoid shrinking * precached inodes, but we will as a last resort. */ - if ((sbi->s_es_last_sorted < ei->i_touch_when) || + if ((es_stats->es_stats_last_sorted < ei->i_touch_when) || (skip_precached && ext4_test_inode_state(&ei->vfs_inode, EXT4_STATE_EXT_PRECACHED))) { nr_skipped++; @@ -988,7 +1004,7 @@ retry: if ((nr_shrunk == 0) && nr_skipped && !retried) { retried++; list_sort(NULL, &sbi->s_es_lru, ext4_inode_touch_time_cmp); - sbi->s_es_last_sorted = jiffies; + es_stats->es_stats_last_sorted = jiffies; ei = list_first_entry(&sbi->s_es_lru, struct ext4_inode_info, i_es_lru); /* @@ -1006,6 +1022,20 @@ retry: if (locked_ei && nr_shrunk == 0) nr_shrunk = __es_try_to_reclaim_extents(locked_ei, nr_to_scan); + scan_time = ktime_to_ns(ktime_sub(ktime_get(), start_time)); + if (likely(es_stats->es_stats_scan_time)) + es_stats->es_stats_scan_time = (scan_time + + es_stats->es_stats_scan_time*3) / 4; + else + es_stats->es_stats_scan_time = scan_time; + if (likely(es_stats->es_stats_shrunk)) + es_stats->es_stats_shrunk = (nr_shrunk + + es_stats->es_stats_shrunk*3) / 4; + else + es_stats->es_stats_shrunk = nr_shrunk; + + trace_ext4_es_shrink(sbi->s_sb, nr_shrunk, scan_time, skip_precached, + nr_skipped, retried); return nr_shrunk; } @@ -1016,7 +1046,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink, struct ext4_sb_info *sbi; sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker); - nr = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); + nr = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_lru_cnt); trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr); return nr; } @@ -1029,7 +1059,7 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, int nr_to_scan = sc->nr_to_scan; int ret, nr_shrunk; - ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); + ret = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_lru_cnt); trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret); if (!nr_to_scan) @@ -1041,19 +1071,140 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, return nr_shrunk; } -void ext4_es_register_shrinker(struct ext4_sb_info *sbi) +static void *ext4_es_seq_shrinker_info_start(struct seq_file *seq, loff_t *pos) +{ + return *pos ? NULL : SEQ_START_TOKEN; +} + +static void * +ext4_es_seq_shrinker_info_next(struct seq_file *seq, void *v, loff_t *pos) +{ + return NULL; +} + +static int ext4_es_seq_shrinker_info_show(struct seq_file *seq, void *v) +{ + struct ext4_sb_info *sbi = seq->private; + struct ext4_es_stats *es_stats = &sbi->s_es_stats; + struct ext4_inode_info *ei, *max = NULL; + unsigned int inode_cnt = 0; + + if (v != SEQ_START_TOKEN) + return 0; + + /* here we just find an inode that has the max nr. of objects */ + spin_lock(&sbi->s_es_lru_lock); + list_for_each_entry(ei, &sbi->s_es_lru, i_es_lru) { + inode_cnt++; + if (max && max->i_es_all_nr < ei->i_es_all_nr) + max = ei; + else if (!max) + max = ei; + } + spin_unlock(&sbi->s_es_lru_lock); + + seq_printf(seq, "stats:\n %lld objects\n %lld reclaimable objects\n", + percpu_counter_sum_positive(&es_stats->es_stats_all_cnt), + percpu_counter_sum_positive(&es_stats->es_stats_lru_cnt)); + if (es_stats->es_stats_last_sorted != 0) + seq_printf(seq, " %ums last sorted interval\n", + jiffies_to_msecs(jiffies - + es_stats->es_stats_last_sorted)); + if (inode_cnt) + seq_printf(seq, " %d inodes on lru list\n", inode_cnt); + + seq_printf(seq, "average:\n %lluus scan time\n", + div_u64(es_stats->es_stats_scan_time, 1000)); + seq_printf(seq, " %lu shrunk objects\n", es_stats->es_stats_shrunk); + if (inode_cnt) + seq_printf(seq, + "maximum:\n %lu inode (%u objects, %u reclaimable)\n", + max->vfs_inode.i_ino, max->i_es_all_nr, max->i_es_lru_nr); + + return 0; +} + +static void ext4_es_seq_shrinker_info_stop(struct seq_file *seq, void *v) { +} + +static const struct seq_operations ext4_es_seq_shrinker_info_ops = { + .start = ext4_es_seq_shrinker_info_start, + .next = ext4_es_seq_shrinker_info_next, + .stop = ext4_es_seq_shrinker_info_stop, + .show = ext4_es_seq_shrinker_info_show, +}; + +static int +ext4_es_seq_shrinker_info_open(struct inode *inode, struct file *file) +{ + int ret; + + ret = seq_open(file, &ext4_es_seq_shrinker_info_ops); + if (!ret) { + struct seq_file *m = file->private_data; + m->private = PDE_DATA(inode); + } + + return ret; +} + +static int +ext4_es_seq_shrinker_info_release(struct inode *inode, struct file *file) +{ + return seq_release(inode, file); +} + +static const struct file_operations ext4_es_seq_shrinker_info_fops = { + .owner = THIS_MODULE, + .open = ext4_es_seq_shrinker_info_open, + .read = seq_read, + .llseek = seq_lseek, + .release = ext4_es_seq_shrinker_info_release, +}; + +int ext4_es_register_shrinker(struct ext4_sb_info *sbi) +{ + int err; + INIT_LIST_HEAD(&sbi->s_es_lru); spin_lock_init(&sbi->s_es_lru_lock); - sbi->s_es_last_sorted = 0; + sbi->s_es_stats.es_stats_last_sorted = 0; + sbi->s_es_stats.es_stats_shrunk = 0; + sbi->s_es_stats.es_stats_scan_time = 0; + err = percpu_counter_init(&sbi->s_es_stats.es_stats_all_cnt, 0); + if (err) + return err; + err = percpu_counter_init(&sbi->s_es_stats.es_stats_lru_cnt, 0); + if (err) + goto err1; + sbi->s_es_shrinker.scan_objects = ext4_es_scan; sbi->s_es_shrinker.count_objects = ext4_es_count; sbi->s_es_shrinker.seeks = DEFAULT_SEEKS; - register_shrinker(&sbi->s_es_shrinker); + err = register_shrinker(&sbi->s_es_shrinker); + if (err) + goto err2; + + if (sbi->s_proc) + proc_create_data("es_shrinker_info", S_IRUGO, sbi->s_proc, + &ext4_es_seq_shrinker_info_fops, sbi); + + return 0; + +err2: + percpu_counter_destroy(&sbi->s_es_stats.es_stats_lru_cnt); +err1: + percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt); + return err; } void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi) { + if (sbi->s_proc) + remove_proc_entry("es_shrinker_info", sbi->s_proc); + percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt); + percpu_counter_destroy(&sbi->s_es_stats.es_stats_lru_cnt); unregister_shrinker(&sbi->s_es_shrinker); } diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h index 167f4ab8..37a4313 100644 --- a/fs/ext4/extents_status.h +++ b/fs/ext4/extents_status.h @@ -64,6 +64,14 @@ struct ext4_es_tree { struct extent_status *cache_es; /* recently accessed extent */ }; +struct ext4_es_stats { + unsigned long es_stats_last_sorted; + unsigned long es_stats_shrunk; + u64 es_stats_scan_time; + struct percpu_counter es_stats_all_cnt; + struct percpu_counter es_stats_lru_cnt; +}; + extern int __init ext4_init_es(void); extern void ext4_exit_es(void); extern void ext4_es_init_tree(struct ext4_es_tree *tree); @@ -129,7 +137,7 @@ static inline void ext4_es_store_status(struct extent_status *es, (es->es_pblk & ~ES_MASK)); } -extern void ext4_es_register_shrinker(struct ext4_sb_info *sbi); +extern int ext4_es_register_shrinker(struct ext4_sb_info *sbi); extern void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi); extern void ext4_es_lru_add(struct inode *inode); extern void ext4_es_lru_del(struct inode *inode); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 1f7784d..9c3a8bd 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -819,7 +819,6 @@ static void ext4_put_super(struct super_block *sb) percpu_counter_destroy(&sbi->s_freeinodes_counter); percpu_counter_destroy(&sbi->s_dirs_counter); percpu_counter_destroy(&sbi->s_dirtyclusters_counter); - percpu_counter_destroy(&sbi->s_extent_cache_cnt); brelse(sbi->s_sbh); #ifdef CONFIG_QUOTA for (i = 0; i < MAXQUOTAS; i++) @@ -879,6 +878,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) ext4_es_init_tree(&ei->i_es_tree); rwlock_init(&ei->i_es_lock); INIT_LIST_HEAD(&ei->i_es_lru); + ei->i_es_all_nr = 0; ei->i_es_lru_nr = 0; ei->i_touch_when = 0; ei->i_reserved_data_blocks = 0; @@ -3869,10 +3869,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_err_report.data = (unsigned long) sb; /* Register extent status tree shrinker */ - ext4_es_register_shrinker(sbi); - - err = percpu_counter_init(&sbi->s_freeclusters_counter, - ext4_count_free_clusters(sb)); + err = ext4_es_register_shrinker(sbi); + if (!err) { + err = percpu_counter_init(&sbi->s_freeclusters_counter, + ext4_count_free_clusters(sb)); + } if (!err) { err = percpu_counter_init(&sbi->s_freeinodes_counter, ext4_count_free_inodes(sb)); @@ -3884,9 +3885,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) if (!err) { err = percpu_counter_init(&sbi->s_dirtyclusters_counter, 0); } - if (!err) { - err = percpu_counter_init(&sbi->s_extent_cache_cnt, 0); - } if (err) { ext4_msg(sb, KERN_ERR, "insufficient memory"); goto failed_mount3; @@ -4191,8 +4189,8 @@ failed_mount_wq: jbd2_journal_destroy(sbi->s_journal); sbi->s_journal = NULL; } -failed_mount3: ext4_es_unregister_shrinker(sbi); +failed_mount3: del_timer_sync(&sbi->s_err_report); if (sbi->s_flex_groups) ext4_kvfree(sbi->s_flex_groups); @@ -4200,7 +4198,6 @@ failed_mount3: percpu_counter_destroy(&sbi->s_freeinodes_counter); percpu_counter_destroy(&sbi->s_dirs_counter); percpu_counter_destroy(&sbi->s_dirtyclusters_counter); - percpu_counter_destroy(&sbi->s_extent_cache_cnt); if (sbi->s_mmp_tsk) kthread_stop(sbi->s_mmp_tsk); failed_mount2: diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 35d28f8..a05a16d 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2422,6 +2422,37 @@ TRACE_EVENT(ext4_es_shrink_scan_exit, __entry->nr_shrunk, __entry->cache_cnt) ); +TRACE_EVENT(ext4_es_shrink, + TP_PROTO(struct super_block *sb, int nr_shrunk, u64 scan_time, + int skip_precached, int nr_skipped, int retried), + + TP_ARGS(sb, nr_shrunk, scan_time, skip_precached, nr_skipped, retried), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __field( int, nr_shrunk ) + __field( unsigned long long, scan_time ) + __field( int, skip_precached ) + __field( int, nr_skipped ) + __field( int, retried ) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->nr_shrunk = nr_shrunk; + __entry->scan_time = div_u64(scan_time, 1000); + __entry->skip_precached = skip_precached; + __entry->nr_skipped = nr_skipped; + __entry->retried = retried; + ), + + TP_printk("dev %d,%d nr_shrunk %d, scan_time %llu skip_precached %d " + "nr_skipped %d retried %d", + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->nr_shrunk, + __entry->scan_time, __entry->skip_precached, + __entry->nr_skipped, __entry->retried) +); + #endif /* _TRACE_EXT4_H */ /* This part must be outside protection */ -- 1.7.9.7 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 2/2] ext4: track extent status tree shrinker delay statictics 2014-01-14 8:06 ` [RFC PATCH 2/2] ext4: track extent status tree shrinker delay statictics Zheng Liu @ 2014-01-14 13:24 ` Jan Kara 0 siblings, 0 replies; 8+ messages in thread From: Jan Kara @ 2014-01-14 13:24 UTC (permalink / raw) To: Zheng Liu Cc: linux-ext4, Theodore Ts'o, Andreas Dilger, Jan Kara, Zheng Liu On Tue 14-01-14 16:06:41, Zheng Liu wrote: > From: Zheng Liu <wenqing.lz@taobao.com> > > This commit adds some statictics in extent status tree shrinker. The > purpose to add these is that we want to collect more details when we > encounter a stall caused by extent status tree shrinker. Here we count > the following statictics: > stats: > the number of all objects on all extent status trees > the number of reclaimable objects on lru list > the last sorted interval > the number of inodes on lru list > average: > scan time for shrinking some objects > the number of shrunk objects > maximum: > the inode that has max nr. of objects on lru list > > The output looks like below: > $ cat /proc/fs/ext4/sda1/es_shrinker_info > stats: > 28228 objects > 6341 reclaimable objects > 586ms last sorted interval > 250 inodes on lru list > average: > 153us scan time > 128 shrunk objects > maximum: > 255 inode (255 objects, 198 reclaimable) > > If the lru list has never been sorted, the following line will not be > printed: > 586ms last sorted interval > If there is an empty lru list, the following lines also will not be > printed: > 250 inodes on lru list > ... > maximum: > 255 inode (255 objects, 198 reclaimable) > > Meanwhile in this commit a new trace point is defined to print some > details in __ext4_es_shrink(). The patch looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > > Cc: "Theodore Ts'o" <tytso@mit.edu> > Cc: Andreas Dilger <adilger.kernel@dilger.ca> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> > --- > fs/ext4/ext4.h | 4 +- > fs/ext4/extents_status.c | 171 ++++++++++++++++++++++++++++++++++++++++--- > fs/ext4/extents_status.h | 10 ++- > fs/ext4/super.c | 17 ++--- > include/trace/events/ext4.h | 31 ++++++++ > 5 files changed, 210 insertions(+), 23 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index ece5556..23ce6b7 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -885,6 +885,7 @@ struct ext4_inode_info { > struct ext4_es_tree i_es_tree; > rwlock_t i_es_lock; > struct list_head i_es_lru; > + unsigned int i_es_all_nr; /* protected by i_es_lock */ > unsigned int i_es_lru_nr; /* protected by i_es_lock */ > unsigned long i_touch_when; /* jiffies of last accessing */ > > @@ -1322,8 +1323,7 @@ struct ext4_sb_info { > /* Reclaim extents from extent status tree */ > struct shrinker s_es_shrinker; > struct list_head s_es_lru; > - unsigned long s_es_last_sorted; > - struct percpu_counter s_extent_cache_cnt; > + struct ext4_es_stats s_es_stats; > spinlock_t s_es_lru_lock ____cacheline_aligned_in_smp; > > /* Ratelimit ext4 messages. */ > diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c > index eb7ae61..5cb25c5 100644 > --- a/fs/ext4/extents_status.c > +++ b/fs/ext4/extents_status.c > @@ -11,6 +11,8 @@ > */ > #include <linux/rbtree.h> > #include <linux/list_sort.h> > +#include <linux/proc_fs.h> > +#include <linux/seq_file.h> > #include "ext4.h" > #include "extents_status.h" > > @@ -313,19 +315,27 @@ ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len, > */ > if (!ext4_es_is_delayed(es)) { > EXT4_I(inode)->i_es_lru_nr++; > - percpu_counter_inc(&EXT4_SB(inode->i_sb)->s_extent_cache_cnt); > + percpu_counter_inc(&EXT4_SB(inode->i_sb)-> > + s_es_stats.es_stats_lru_cnt); > } > > + EXT4_I(inode)->i_es_all_nr++; > + percpu_counter_inc(&EXT4_SB(inode->i_sb)->s_es_stats.es_stats_all_cnt); > + > return es; > } > > static void ext4_es_free_extent(struct inode *inode, struct extent_status *es) > { > + EXT4_I(inode)->i_es_all_nr--; > + percpu_counter_dec(&EXT4_SB(inode->i_sb)->s_es_stats.es_stats_all_cnt); > + > /* Decrease the lru counter when this es is not delayed */ > if (!ext4_es_is_delayed(es)) { > BUG_ON(EXT4_I(inode)->i_es_lru_nr == 0); > EXT4_I(inode)->i_es_lru_nr--; > - percpu_counter_dec(&EXT4_SB(inode->i_sb)->s_extent_cache_cnt); > + percpu_counter_dec(&EXT4_SB(inode->i_sb)-> > + s_es_stats.es_stats_lru_cnt); > } > > kmem_cache_free(ext4_es_cachep, es); > @@ -929,11 +939,16 @@ static int __ext4_es_shrink(struct ext4_sb_info *sbi, int nr_to_scan, > struct ext4_inode_info *locked_ei) > { > struct ext4_inode_info *ei; > + struct ext4_es_stats *es_stats; > struct list_head *cur, *tmp; > LIST_HEAD(skipped); > + ktime_t start_time; > + u64 scan_time; > int nr_shrunk = 0; > int retried = 0, skip_precached = 1, nr_skipped = 0; > > + es_stats = &sbi->s_es_stats; > + start_time = ktime_get(); > spin_lock(&sbi->s_es_lru_lock); > > retry: > @@ -944,7 +959,8 @@ retry: > * If we have already reclaimed all extents from extent > * status tree, just stop the loop immediately. > */ > - if (percpu_counter_read_positive(&sbi->s_extent_cache_cnt) == 0) > + if (percpu_counter_read_positive( > + &es_stats->es_stats_lru_cnt) == 0) > break; > > ei = list_entry(cur, struct ext4_inode_info, i_es_lru); > @@ -954,7 +970,7 @@ retry: > * time. Normally we try hard to avoid shrinking > * precached inodes, but we will as a last resort. > */ > - if ((sbi->s_es_last_sorted < ei->i_touch_when) || > + if ((es_stats->es_stats_last_sorted < ei->i_touch_when) || > (skip_precached && ext4_test_inode_state(&ei->vfs_inode, > EXT4_STATE_EXT_PRECACHED))) { > nr_skipped++; > @@ -988,7 +1004,7 @@ retry: > if ((nr_shrunk == 0) && nr_skipped && !retried) { > retried++; > list_sort(NULL, &sbi->s_es_lru, ext4_inode_touch_time_cmp); > - sbi->s_es_last_sorted = jiffies; > + es_stats->es_stats_last_sorted = jiffies; > ei = list_first_entry(&sbi->s_es_lru, struct ext4_inode_info, > i_es_lru); > /* > @@ -1006,6 +1022,20 @@ retry: > if (locked_ei && nr_shrunk == 0) > nr_shrunk = __es_try_to_reclaim_extents(locked_ei, nr_to_scan); > > + scan_time = ktime_to_ns(ktime_sub(ktime_get(), start_time)); > + if (likely(es_stats->es_stats_scan_time)) > + es_stats->es_stats_scan_time = (scan_time + > + es_stats->es_stats_scan_time*3) / 4; > + else > + es_stats->es_stats_scan_time = scan_time; > + if (likely(es_stats->es_stats_shrunk)) > + es_stats->es_stats_shrunk = (nr_shrunk + > + es_stats->es_stats_shrunk*3) / 4; > + else > + es_stats->es_stats_shrunk = nr_shrunk; > + > + trace_ext4_es_shrink(sbi->s_sb, nr_shrunk, scan_time, skip_precached, > + nr_skipped, retried); > return nr_shrunk; > } > > @@ -1016,7 +1046,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink, > struct ext4_sb_info *sbi; > > sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker); > - nr = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); > + nr = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_lru_cnt); > trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr); > return nr; > } > @@ -1029,7 +1059,7 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, > int nr_to_scan = sc->nr_to_scan; > int ret, nr_shrunk; > > - ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); > + ret = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_lru_cnt); > trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret); > > if (!nr_to_scan) > @@ -1041,19 +1071,140 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, > return nr_shrunk; > } > > -void ext4_es_register_shrinker(struct ext4_sb_info *sbi) > +static void *ext4_es_seq_shrinker_info_start(struct seq_file *seq, loff_t *pos) > +{ > + return *pos ? NULL : SEQ_START_TOKEN; > +} > + > +static void * > +ext4_es_seq_shrinker_info_next(struct seq_file *seq, void *v, loff_t *pos) > +{ > + return NULL; > +} > + > +static int ext4_es_seq_shrinker_info_show(struct seq_file *seq, void *v) > +{ > + struct ext4_sb_info *sbi = seq->private; > + struct ext4_es_stats *es_stats = &sbi->s_es_stats; > + struct ext4_inode_info *ei, *max = NULL; > + unsigned int inode_cnt = 0; > + > + if (v != SEQ_START_TOKEN) > + return 0; > + > + /* here we just find an inode that has the max nr. of objects */ > + spin_lock(&sbi->s_es_lru_lock); > + list_for_each_entry(ei, &sbi->s_es_lru, i_es_lru) { > + inode_cnt++; > + if (max && max->i_es_all_nr < ei->i_es_all_nr) > + max = ei; > + else if (!max) > + max = ei; > + } > + spin_unlock(&sbi->s_es_lru_lock); > + > + seq_printf(seq, "stats:\n %lld objects\n %lld reclaimable objects\n", > + percpu_counter_sum_positive(&es_stats->es_stats_all_cnt), > + percpu_counter_sum_positive(&es_stats->es_stats_lru_cnt)); > + if (es_stats->es_stats_last_sorted != 0) > + seq_printf(seq, " %ums last sorted interval\n", > + jiffies_to_msecs(jiffies - > + es_stats->es_stats_last_sorted)); > + if (inode_cnt) > + seq_printf(seq, " %d inodes on lru list\n", inode_cnt); > + > + seq_printf(seq, "average:\n %lluus scan time\n", > + div_u64(es_stats->es_stats_scan_time, 1000)); > + seq_printf(seq, " %lu shrunk objects\n", es_stats->es_stats_shrunk); > + if (inode_cnt) > + seq_printf(seq, > + "maximum:\n %lu inode (%u objects, %u reclaimable)\n", > + max->vfs_inode.i_ino, max->i_es_all_nr, max->i_es_lru_nr); > + > + return 0; > +} > + > +static void ext4_es_seq_shrinker_info_stop(struct seq_file *seq, void *v) > { > +} > + > +static const struct seq_operations ext4_es_seq_shrinker_info_ops = { > + .start = ext4_es_seq_shrinker_info_start, > + .next = ext4_es_seq_shrinker_info_next, > + .stop = ext4_es_seq_shrinker_info_stop, > + .show = ext4_es_seq_shrinker_info_show, > +}; > + > +static int > +ext4_es_seq_shrinker_info_open(struct inode *inode, struct file *file) > +{ > + int ret; > + > + ret = seq_open(file, &ext4_es_seq_shrinker_info_ops); > + if (!ret) { > + struct seq_file *m = file->private_data; > + m->private = PDE_DATA(inode); > + } > + > + return ret; > +} > + > +static int > +ext4_es_seq_shrinker_info_release(struct inode *inode, struct file *file) > +{ > + return seq_release(inode, file); > +} > + > +static const struct file_operations ext4_es_seq_shrinker_info_fops = { > + .owner = THIS_MODULE, > + .open = ext4_es_seq_shrinker_info_open, > + .read = seq_read, > + .llseek = seq_lseek, > + .release = ext4_es_seq_shrinker_info_release, > +}; > + > +int ext4_es_register_shrinker(struct ext4_sb_info *sbi) > +{ > + int err; > + > INIT_LIST_HEAD(&sbi->s_es_lru); > spin_lock_init(&sbi->s_es_lru_lock); > - sbi->s_es_last_sorted = 0; > + sbi->s_es_stats.es_stats_last_sorted = 0; > + sbi->s_es_stats.es_stats_shrunk = 0; > + sbi->s_es_stats.es_stats_scan_time = 0; > + err = percpu_counter_init(&sbi->s_es_stats.es_stats_all_cnt, 0); > + if (err) > + return err; > + err = percpu_counter_init(&sbi->s_es_stats.es_stats_lru_cnt, 0); > + if (err) > + goto err1; > + > sbi->s_es_shrinker.scan_objects = ext4_es_scan; > sbi->s_es_shrinker.count_objects = ext4_es_count; > sbi->s_es_shrinker.seeks = DEFAULT_SEEKS; > - register_shrinker(&sbi->s_es_shrinker); > + err = register_shrinker(&sbi->s_es_shrinker); > + if (err) > + goto err2; > + > + if (sbi->s_proc) > + proc_create_data("es_shrinker_info", S_IRUGO, sbi->s_proc, > + &ext4_es_seq_shrinker_info_fops, sbi); > + > + return 0; > + > +err2: > + percpu_counter_destroy(&sbi->s_es_stats.es_stats_lru_cnt); > +err1: > + percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt); > + return err; > } > > void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi) > { > + if (sbi->s_proc) > + remove_proc_entry("es_shrinker_info", sbi->s_proc); > + percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt); > + percpu_counter_destroy(&sbi->s_es_stats.es_stats_lru_cnt); > unregister_shrinker(&sbi->s_es_shrinker); > } > > diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h > index 167f4ab8..37a4313 100644 > --- a/fs/ext4/extents_status.h > +++ b/fs/ext4/extents_status.h > @@ -64,6 +64,14 @@ struct ext4_es_tree { > struct extent_status *cache_es; /* recently accessed extent */ > }; > > +struct ext4_es_stats { > + unsigned long es_stats_last_sorted; > + unsigned long es_stats_shrunk; > + u64 es_stats_scan_time; > + struct percpu_counter es_stats_all_cnt; > + struct percpu_counter es_stats_lru_cnt; > +}; > + > extern int __init ext4_init_es(void); > extern void ext4_exit_es(void); > extern void ext4_es_init_tree(struct ext4_es_tree *tree); > @@ -129,7 +137,7 @@ static inline void ext4_es_store_status(struct extent_status *es, > (es->es_pblk & ~ES_MASK)); > } > > -extern void ext4_es_register_shrinker(struct ext4_sb_info *sbi); > +extern int ext4_es_register_shrinker(struct ext4_sb_info *sbi); > extern void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi); > extern void ext4_es_lru_add(struct inode *inode); > extern void ext4_es_lru_del(struct inode *inode); > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 1f7784d..9c3a8bd 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -819,7 +819,6 @@ static void ext4_put_super(struct super_block *sb) > percpu_counter_destroy(&sbi->s_freeinodes_counter); > percpu_counter_destroy(&sbi->s_dirs_counter); > percpu_counter_destroy(&sbi->s_dirtyclusters_counter); > - percpu_counter_destroy(&sbi->s_extent_cache_cnt); > brelse(sbi->s_sbh); > #ifdef CONFIG_QUOTA > for (i = 0; i < MAXQUOTAS; i++) > @@ -879,6 +878,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) > ext4_es_init_tree(&ei->i_es_tree); > rwlock_init(&ei->i_es_lock); > INIT_LIST_HEAD(&ei->i_es_lru); > + ei->i_es_all_nr = 0; > ei->i_es_lru_nr = 0; > ei->i_touch_when = 0; > ei->i_reserved_data_blocks = 0; > @@ -3869,10 +3869,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) > sbi->s_err_report.data = (unsigned long) sb; > > /* Register extent status tree shrinker */ > - ext4_es_register_shrinker(sbi); > - > - err = percpu_counter_init(&sbi->s_freeclusters_counter, > - ext4_count_free_clusters(sb)); > + err = ext4_es_register_shrinker(sbi); > + if (!err) { > + err = percpu_counter_init(&sbi->s_freeclusters_counter, > + ext4_count_free_clusters(sb)); > + } > if (!err) { > err = percpu_counter_init(&sbi->s_freeinodes_counter, > ext4_count_free_inodes(sb)); > @@ -3884,9 +3885,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) > if (!err) { > err = percpu_counter_init(&sbi->s_dirtyclusters_counter, 0); > } > - if (!err) { > - err = percpu_counter_init(&sbi->s_extent_cache_cnt, 0); > - } > if (err) { > ext4_msg(sb, KERN_ERR, "insufficient memory"); > goto failed_mount3; > @@ -4191,8 +4189,8 @@ failed_mount_wq: > jbd2_journal_destroy(sbi->s_journal); > sbi->s_journal = NULL; > } > -failed_mount3: > ext4_es_unregister_shrinker(sbi); > +failed_mount3: > del_timer_sync(&sbi->s_err_report); > if (sbi->s_flex_groups) > ext4_kvfree(sbi->s_flex_groups); > @@ -4200,7 +4198,6 @@ failed_mount3: > percpu_counter_destroy(&sbi->s_freeinodes_counter); > percpu_counter_destroy(&sbi->s_dirs_counter); > percpu_counter_destroy(&sbi->s_dirtyclusters_counter); > - percpu_counter_destroy(&sbi->s_extent_cache_cnt); > if (sbi->s_mmp_tsk) > kthread_stop(sbi->s_mmp_tsk); > failed_mount2: > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h > index 35d28f8..a05a16d 100644 > --- a/include/trace/events/ext4.h > +++ b/include/trace/events/ext4.h > @@ -2422,6 +2422,37 @@ TRACE_EVENT(ext4_es_shrink_scan_exit, > __entry->nr_shrunk, __entry->cache_cnt) > ); > > +TRACE_EVENT(ext4_es_shrink, > + TP_PROTO(struct super_block *sb, int nr_shrunk, u64 scan_time, > + int skip_precached, int nr_skipped, int retried), > + > + TP_ARGS(sb, nr_shrunk, scan_time, skip_precached, nr_skipped, retried), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __field( int, nr_shrunk ) > + __field( unsigned long long, scan_time ) > + __field( int, skip_precached ) > + __field( int, nr_skipped ) > + __field( int, retried ) > + ), > + > + TP_fast_assign( > + __entry->dev = sb->s_dev; > + __entry->nr_shrunk = nr_shrunk; > + __entry->scan_time = div_u64(scan_time, 1000); > + __entry->skip_precached = skip_precached; > + __entry->nr_skipped = nr_skipped; > + __entry->retried = retried; > + ), > + > + TP_printk("dev %d,%d nr_shrunk %d, scan_time %llu skip_precached %d " > + "nr_skipped %d retried %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->nr_shrunk, > + __entry->scan_time, __entry->skip_precached, > + __entry->nr_skipped, __entry->retried) > +); > + > #endif /* _TRACE_EXT4_H */ > > /* This part must be outside protection */ > -- > 1.7.9.7 > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH 0/2] ext4: extents status tree shrinker improvement @ 2013-12-20 10:42 Zheng Liu 2013-12-20 10:42 ` [RFC PATCH 1/2] ext4: improve extents status tree trace point Zheng Liu 0 siblings, 1 reply; 8+ messages in thread From: Zheng Liu @ 2013-12-20 10:42 UTC (permalink / raw) To: linux-ext4; +Cc: Zheng Liu, Theodore Ts'o, Andreas Dilger Hi all, I still have some ideas for improving the extents status tree shrinker. But before doing that, I think I should send out these patches for discussion so that I want to be sure that I am on the right way. I describe my work below, which contains four parts. The first part describes my work of improvment for the shrinker. In second part I describe the test case for measurement. In third part I measure the performance and compare my work with current implememtation on mainline kernel. The last part is TODO list. I describe some ideas that might improve the performance of extents status tree shrinker. Improvement =========== According to this bug report [1], the extents status tree shrinker might causes excessive time stall under heavy memory pressure. Due to lacking some details from this report, I am not sure whether the delalloc is enabled or not. But if I remember correctly Andreas has mentioned that he also notices that es shrinker takes too much time to try to reclaim some entries from extents status tree, and he observes that the app does a lot of buffered writes. So I suspect that the root cause is that currently es shrinker has to scan a lot of delayed entries that don't be reclaimed under some workloads. Hence it burns too much cpu time. 1. https://bugzilla.kernel.org/show_bug.cgi?id=62751 Why don't we discard delayed entry from extents status tree? Now we need to use delayed entry in extents status tree to reserve delayed space for delayed allocation, and support seek_data/hole. So we couldn't discard these delayed entries. In this patch series, it contains two patches. Patch 1 just improves the trace point in extents status tree. Later I will use them to calculate the latency of every function. This is a trival patch. So feel free to pick it up. Patch 2 tries to make es shrinker avoid scanning delayed entry. It adds a evictable list in order to track all reclaimable objects. Meanwhile a hlist_node structure is defined in 'extent_status' structure. The defect of using hlist_node is that it costs extra 1/3 memory space. On 64bits platform, extent_status structure occupies 48 bytes. After that it will take 64 bytes. Then the es shrinker can traverse on this list. It shouldn't encounter any delayed entry. So it reduces the time taking sbi->s_es_lru_lock and ei->i_es_lock due to it don't need to scan all delayed extries. Workload ======== For comparing the improvement, I use fio to generate a workload that could cause some fragmented files. For getting a extreme fragmented tree, I hack the code in ext4_es_can_be_merge() to always return 0. That means we don't try to merge any entry in extents status tree. Meanwhile vm.dirty_ratio, vm.dirty_background_ratio are adjusted to 80 and 60 on my machine. The purpose of doing this is to accumulate more delayed entries on extents status tree. The configure file of fio is like below: [global] ioengine=psync bs=4k directory=/mnt/sda1 thread group_reporting fallocate=0 direct=0 filesize=10g size=20g runtime=300 [io] rw=randwrite:32 rw_sequencer=sequential numjobs=25 nrfiles=10 sysctl settings: #!/bin/bash # # This script is used to adjust sysctl parameter to increase the memory # pressure and accumluate some delayed entries. sudo sysctl vm.dirty_ratio=80 sudo sysctl vm.dirty_background_ratio=60 Using this workload, the extents status tree will look like this: [0, 1)[1, 2)...[31, 32)[32, 51)[51, 52)[52, 53)...[101, 102)...[111, 112) [ D ][ D ]...[ D ][ H ][ D ][ D ]...[ W ]...[ U ] D: Delayed entry H: Hole entry W: Written entry U: Unwritten entry *NOTE* This workload couldn't reproduce the problem that is reported by [1]. I run this workload on my desktop, which has a Intel Core Duo CPU @ 3.0 GHz, 4G memeory and a HDD. I will run the same workload on a server to measure the result of improvement. Comparison =========== slabtop ------- slabtop is used to measure memory space which the slab of extent_status occupies. The following is the result of $(slabtop -o) vanilla: 718890 688677 95% 0.04K 7730 93 30920K ext4_extent_status patched: 718420 675933 94% 0.05K 10565 68 42260K ext4_extent_status vanilla patched +/- (%) ext4_extent_status 30920k 42260k 36.68 As I said above, adding hlist_node adds ~1/3 extra memory space. perf record -g -------------- The following is the result of $(perf diff), which compares the result of $(perf record -g). Here I cut all *_es_* functions and paste them here. # Event 'cycles' # # Baseline Delta Shared Object Symbol # ........ ....... .................. ............................................ # 1.82% +0.04% [ext4] [k] ext4_es_lookup_extent 0.55% +0.01% [ext4] [k] __es_tree_search 0.39% -0.05% [ext4] [k] __es_insert_extent 0.10% +0.02% [ext4] [k] ext4_es_insert_extent 0.08% +0.01% [ext4] [k] __es_remove_extent 0.07% +0.01% [ext4] [k] ext4_es_lru_add 0.01% -0.01% [ext4] [k] __es_try_to_reclaim_extents 0.00% [ext4] [k] ext4_es_free_extent 0.00% [ext4] [k] ext4_es_cache_extent perf lock record -g ------------------- Here is the result of $(perf lock report -k wait_max). Name acquired contended avg wait (ns) total wait (ns) max wait (ns) min wait (ns) vanilla: &(&sbi->s_es_lru... 152 1 87275 87275 87275 87275 patched: &(&sbi->s_es_lru... 148 0 0 0 0 0 'sbi->s_es_lru_lock' protects the entire lru list of inodes which have reclaimable entries. When es shrinker tries to reclaim some entries, this lock will be taken during this process. From the result we can see that the improvement can reduce the waiting time. latency ------- Here I calculate the latency of the following operations from the result of trace points. (Unit: *us*) - es_find_delayed_extent_range - x vanilla + patched N Min Max Median Avg Stddev x 4425 0 54 1 1.0280226 1.0898311 + 7063 0 140 1 1.0127425 1.8662003 - es_insert_extent - x vanilla + patched N Min Max Median Avg Stddev x 26995 1 1270 2 4.1053899 20.26818 + 26559 1 440 2 3.0948454 7.6911348 - es_lookup_extent - x vanilla + patched N Min Max Median Avg Stddev x 32025 0 104 1 1.6082748 1.385739 + 31433 0 168 1 1.6219578 1.4846153 - es_shrink_scan - x vanilla + patched N Min Max Median Avg Stddev x 1219 24 1872 171 278.35603 275.8815 + 1405 17 339 86 95.057651 33.216144 The result of latency shows that the patch can reduce the latency when es shrinker tries to reclaim the entry. Meanwhile other operations don't be impacted too much. TODO list ========= Numa-aware shrinker ------------------- Now the shrinker has been numa-aware. But the es shrinker doesn't support it. So an improvement is to make es shrinker numa-aware. The key issue is that extent_status objects might not be on the same node with its inode. So that means we need to use lru_list on ext4_sb_info in order to make s_es_lru list numa-aware, and use lru_list on ext4_es_tree for making evictable list numa-aware. I am not sure it is worthwhile. rcu-skiplist ------------ Chris Mason has sent out a rcu-skiplist patch set. So an potential improvement here is to use rcu-skiplist in extents status tree to track all entries. Now we use rb-tree to do the same thing. But the defect of using rb-tree here is that we need to rotate the subtree when a node is inserted or removed. That means that rb-tree couldn't provide a very stable latency when it tries to balance the subtree. The skiplist does not need to do this. So I suspect that it would provide more smooth latency. Regards, - Zheng Zheng Liu (2): ext4: improve extents status tree trace point ext4: improve extents status tree shrinker to avoid scanning delayed entries fs/ext4/extents_status.c | 53 +++++++++++++++++++++++-------------------- fs/ext4/extents_status.h | 2 ++ include/trace/events/ext4.h | 46 +++++++++++++++++++++++++++++++++---- 3 files changed, 71 insertions(+), 30 deletions(-) -- 1.7.9.7 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH 1/2] ext4: improve extents status tree trace point 2013-12-20 10:42 [RFC PATCH 0/2] ext4: extents status tree shrinker improvement Zheng Liu @ 2013-12-20 10:42 ` Zheng Liu 2013-12-23 8:39 ` Jan Kara 0 siblings, 1 reply; 8+ messages in thread From: Zheng Liu @ 2013-12-20 10:42 UTC (permalink / raw) To: linux-ext4; +Cc: Theodore Ts'o, Andreas Dilger, Zheng Liu From: Zheng Liu <wenqing.lz@taobao.com> This commit improves the trace point of extents status tree. For improving the shrinker, we need to collect more details from trace point. First we put more enter/exit pairs in insertion, deletion, and caching code path in order to collect the latency of these code path. Then we rename trace_ext4_es_shrink_enter in ext4_es_count() because it is also used in ext4_es_scan() and we don't identify them from the result. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> --- fs/ext4/extents_status.c | 15 ++++++++------ include/trace/events/ext4.h | 46 ++++++++++++++++++++++++++++++++++++++----- 2 files changed, 50 insertions(+), 11 deletions(-) diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index 3981ff7..e842d74 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -660,7 +660,7 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, newes.es_len = len; ext4_es_store_pblock(&newes, pblk); ext4_es_store_status(&newes, status); - trace_ext4_es_insert_extent(inode, &newes); + trace_ext4_es_insert_extent_enter(inode, &newes); ext4_es_insert_extent_check(inode, &newes); @@ -678,6 +678,7 @@ retry: error: write_unlock(&EXT4_I(inode)->i_es_lock); + trace_ext4_es_insert_extent_exit(inode, &newes); ext4_es_print_tree(inode); @@ -701,7 +702,7 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, newes.es_len = len; ext4_es_store_pblock(&newes, pblk); ext4_es_store_status(&newes, status); - trace_ext4_es_cache_extent(inode, &newes); + trace_ext4_es_cache_extent_enter(inode, &newes); if (!len) return; @@ -714,6 +715,7 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, if (!es || es->es_lblk > end) __es_insert_extent(inode, &newes); write_unlock(&EXT4_I(inode)->i_es_lock); + trace_ext4_es_cache_extent_exit(inode, &newes); } /* @@ -887,7 +889,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t end; int err = 0; - trace_ext4_es_remove_extent(inode, lblk, len); + trace_ext4_es_remove_extent_enter(inode, lblk, len); es_debug("remove [%u/%u) from extent status tree of inode %lu\n", lblk, len, inode->i_ino); @@ -901,6 +903,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, err = __es_remove_extent(inode, lblk, end); write_unlock(&EXT4_I(inode)->i_es_lock); ext4_es_print_tree(inode); + trace_ext4_es_remove_extent_exit(inode, lblk, len); return err; } @@ -1017,7 +1020,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink, sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker); nr = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); - trace_ext4_es_shrink_enter(sbi->s_sb, sc->nr_to_scan, nr); + trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr); return nr; } @@ -1030,14 +1033,14 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, int ret, nr_shrunk; ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); - trace_ext4_es_shrink_enter(sbi->s_sb, nr_to_scan, ret); + trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret); if (!nr_to_scan) return ret; nr_shrunk = __ext4_es_shrink(sbi, nr_to_scan, NULL); - trace_ext4_es_shrink_exit(sbi->s_sb, nr_shrunk, ret); + trace_ext4_es_shrink_scan_exit(sbi->s_sb, nr_shrunk, ret); return nr_shrunk; } diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 197d312..8b3ff82 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2221,19 +2221,31 @@ DECLARE_EVENT_CLASS(ext4__es_extent, __entry->pblk, show_extent_status(__entry->status)) ); -DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent, +DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent_enter, TP_PROTO(struct inode *inode, struct extent_status *es), TP_ARGS(inode, es) ); -DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent, +DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent_exit, TP_PROTO(struct inode *inode, struct extent_status *es), TP_ARGS(inode, es) ); -TRACE_EVENT(ext4_es_remove_extent, +DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent_enter, + TP_PROTO(struct inode *inode, struct extent_status *es), + + TP_ARGS(inode, es) +); + +DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent_exit, + TP_PROTO(struct inode *inode, struct extent_status *es), + + TP_ARGS(inode, es) +); + +DECLARE_EVENT_CLASS(ext4__es_remove_extent, TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), TP_ARGS(inode, lblk, len), @@ -2258,6 +2270,18 @@ TRACE_EVENT(ext4_es_remove_extent, __entry->lblk, __entry->len) ); +DEFINE_EVENT(ext4__es_remove_extent, ext4_es_remove_extent_enter, + TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), + + TP_ARGS(inode, lblk, len) +); + +DEFINE_EVENT(ext4__es_remove_extent, ext4_es_remove_extent_exit, + TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), + + TP_ARGS(inode, lblk, len) +); + TRACE_EVENT(ext4_es_find_delayed_extent_range_enter, TP_PROTO(struct inode *inode, ext4_lblk_t lblk), @@ -2366,7 +2390,7 @@ TRACE_EVENT(ext4_es_lookup_extent_exit, show_extent_status(__entry->found ? __entry->status : 0)) ); -TRACE_EVENT(ext4_es_shrink_enter, +DECLARE_EVENT_CLASS(ext4__es_shrink_enter, TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), TP_ARGS(sb, nr_to_scan, cache_cnt), @@ -2388,7 +2412,19 @@ TRACE_EVENT(ext4_es_shrink_enter, __entry->nr_to_scan, __entry->cache_cnt) ); -TRACE_EVENT(ext4_es_shrink_exit, +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_count, + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), + + TP_ARGS(sb, nr_to_scan, cache_cnt) +); + +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_scan_enter, + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), + + TP_ARGS(sb, nr_to_scan, cache_cnt) +); + +TRACE_EVENT(ext4_es_shrink_scan_exit, TP_PROTO(struct super_block *sb, int shrunk_nr, int cache_cnt), TP_ARGS(sb, shrunk_nr, cache_cnt), -- 1.7.9.7 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/2] ext4: improve extents status tree trace point 2013-12-20 10:42 ` [RFC PATCH 1/2] ext4: improve extents status tree trace point Zheng Liu @ 2013-12-23 8:39 ` Jan Kara 2013-12-25 3:23 ` Zheng Liu 0 siblings, 1 reply; 8+ messages in thread From: Jan Kara @ 2013-12-23 8:39 UTC (permalink / raw) To: Zheng Liu; +Cc: linux-ext4, Theodore Ts'o, Andreas Dilger, Zheng Liu On Fri 20-12-13 18:42:44, Zheng Liu wrote: > From: Zheng Liu <wenqing.lz@taobao.com> > > This commit improves the trace point of extents status tree. For > improving the shrinker, we need to collect more details from trace > point. First we put more enter/exit pairs in insertion, deletion, > and caching code path in order to collect the latency of these code > path. Then we rename trace_ext4_es_shrink_enter in ext4_es_count() > because it is also used in ext4_es_scan() and we don't identify > them from the result. Umm, I think you can use a generic 'function trace' infrastructure to trace entry & exit into arbitrary functions in kernel. So there shouldn't be a need to introduce extra entry & exit tracepoints. Honza > Cc: "Theodore Ts'o" <tytso@mit.edu> > Cc: Andreas Dilger <adilger.kernel@dilger.ca> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> > --- > fs/ext4/extents_status.c | 15 ++++++++------ > include/trace/events/ext4.h | 46 ++++++++++++++++++++++++++++++++++++++----- > 2 files changed, 50 insertions(+), 11 deletions(-) > > diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c > index 3981ff7..e842d74 100644 > --- a/fs/ext4/extents_status.c > +++ b/fs/ext4/extents_status.c > @@ -660,7 +660,7 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, > newes.es_len = len; > ext4_es_store_pblock(&newes, pblk); > ext4_es_store_status(&newes, status); > - trace_ext4_es_insert_extent(inode, &newes); > + trace_ext4_es_insert_extent_enter(inode, &newes); > > ext4_es_insert_extent_check(inode, &newes); > > @@ -678,6 +678,7 @@ retry: > > error: > write_unlock(&EXT4_I(inode)->i_es_lock); > + trace_ext4_es_insert_extent_exit(inode, &newes); > > ext4_es_print_tree(inode); > > @@ -701,7 +702,7 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, > newes.es_len = len; > ext4_es_store_pblock(&newes, pblk); > ext4_es_store_status(&newes, status); > - trace_ext4_es_cache_extent(inode, &newes); > + trace_ext4_es_cache_extent_enter(inode, &newes); > > if (!len) > return; > @@ -714,6 +715,7 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, > if (!es || es->es_lblk > end) > __es_insert_extent(inode, &newes); > write_unlock(&EXT4_I(inode)->i_es_lock); > + trace_ext4_es_cache_extent_exit(inode, &newes); > } > > /* > @@ -887,7 +889,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, > ext4_lblk_t end; > int err = 0; > > - trace_ext4_es_remove_extent(inode, lblk, len); > + trace_ext4_es_remove_extent_enter(inode, lblk, len); > es_debug("remove [%u/%u) from extent status tree of inode %lu\n", > lblk, len, inode->i_ino); > > @@ -901,6 +903,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, > err = __es_remove_extent(inode, lblk, end); > write_unlock(&EXT4_I(inode)->i_es_lock); > ext4_es_print_tree(inode); > + trace_ext4_es_remove_extent_exit(inode, lblk, len); > return err; > } > > @@ -1017,7 +1020,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink, > > sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker); > nr = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); > - trace_ext4_es_shrink_enter(sbi->s_sb, sc->nr_to_scan, nr); > + trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr); > return nr; > } > > @@ -1030,14 +1033,14 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, > int ret, nr_shrunk; > > ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); > - trace_ext4_es_shrink_enter(sbi->s_sb, nr_to_scan, ret); > + trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret); > > if (!nr_to_scan) > return ret; > > nr_shrunk = __ext4_es_shrink(sbi, nr_to_scan, NULL); > > - trace_ext4_es_shrink_exit(sbi->s_sb, nr_shrunk, ret); > + trace_ext4_es_shrink_scan_exit(sbi->s_sb, nr_shrunk, ret); > return nr_shrunk; > } > > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h > index 197d312..8b3ff82 100644 > --- a/include/trace/events/ext4.h > +++ b/include/trace/events/ext4.h > @@ -2221,19 +2221,31 @@ DECLARE_EVENT_CLASS(ext4__es_extent, > __entry->pblk, show_extent_status(__entry->status)) > ); > > -DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent, > +DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent_enter, > TP_PROTO(struct inode *inode, struct extent_status *es), > > TP_ARGS(inode, es) > ); > > -DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent, > +DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent_exit, > TP_PROTO(struct inode *inode, struct extent_status *es), > > TP_ARGS(inode, es) > ); > > -TRACE_EVENT(ext4_es_remove_extent, > +DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent_enter, > + TP_PROTO(struct inode *inode, struct extent_status *es), > + > + TP_ARGS(inode, es) > +); > + > +DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent_exit, > + TP_PROTO(struct inode *inode, struct extent_status *es), > + > + TP_ARGS(inode, es) > +); > + > +DECLARE_EVENT_CLASS(ext4__es_remove_extent, > TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), > > TP_ARGS(inode, lblk, len), > @@ -2258,6 +2270,18 @@ TRACE_EVENT(ext4_es_remove_extent, > __entry->lblk, __entry->len) > ); > > +DEFINE_EVENT(ext4__es_remove_extent, ext4_es_remove_extent_enter, > + TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), > + > + TP_ARGS(inode, lblk, len) > +); > + > +DEFINE_EVENT(ext4__es_remove_extent, ext4_es_remove_extent_exit, > + TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), > + > + TP_ARGS(inode, lblk, len) > +); > + > TRACE_EVENT(ext4_es_find_delayed_extent_range_enter, > TP_PROTO(struct inode *inode, ext4_lblk_t lblk), > > @@ -2366,7 +2390,7 @@ TRACE_EVENT(ext4_es_lookup_extent_exit, > show_extent_status(__entry->found ? __entry->status : 0)) > ); > > -TRACE_EVENT(ext4_es_shrink_enter, > +DECLARE_EVENT_CLASS(ext4__es_shrink_enter, > TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > > TP_ARGS(sb, nr_to_scan, cache_cnt), > @@ -2388,7 +2412,19 @@ TRACE_EVENT(ext4_es_shrink_enter, > __entry->nr_to_scan, __entry->cache_cnt) > ); > > -TRACE_EVENT(ext4_es_shrink_exit, > +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_count, > + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > + > + TP_ARGS(sb, nr_to_scan, cache_cnt) > +); > + > +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_scan_enter, > + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > + > + TP_ARGS(sb, nr_to_scan, cache_cnt) > +); > + > +TRACE_EVENT(ext4_es_shrink_scan_exit, > TP_PROTO(struct super_block *sb, int shrunk_nr, int cache_cnt), > > TP_ARGS(sb, shrunk_nr, cache_cnt), > -- > 1.7.9.7 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/2] ext4: improve extents status tree trace point 2013-12-23 8:39 ` Jan Kara @ 2013-12-25 3:23 ` Zheng Liu 0 siblings, 0 replies; 8+ messages in thread From: Zheng Liu @ 2013-12-25 3:23 UTC (permalink / raw) To: Jan Kara; +Cc: linux-ext4, Theodore Ts'o, Andreas Dilger, Zheng Liu On Mon, Dec 23, 2013 at 09:39:03AM +0100, Jan Kara wrote: > On Fri 20-12-13 18:42:44, Zheng Liu wrote: > > From: Zheng Liu <wenqing.lz@taobao.com> > > > > This commit improves the trace point of extents status tree. For > > improving the shrinker, we need to collect more details from trace > > point. First we put more enter/exit pairs in insertion, deletion, > > and caching code path in order to collect the latency of these code > > path. Then we rename trace_ext4_es_shrink_enter in ext4_es_count() > > because it is also used in ext4_es_scan() and we don't identify > > them from the result. > Umm, I think you can use a generic 'function trace' infrastructure to > trace entry & exit into arbitrary functions in kernel. So there shouldn't > be a need to introduce extra entry & exit tracepoints. Ah, thanks for reminding me. Yes, 'function trace' is extreme useful! - Zheng > > Honza > > > Cc: "Theodore Ts'o" <tytso@mit.edu> > > Cc: Andreas Dilger <adilger.kernel@dilger.ca> > > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> > > --- > > fs/ext4/extents_status.c | 15 ++++++++------ > > include/trace/events/ext4.h | 46 ++++++++++++++++++++++++++++++++++++++----- > > 2 files changed, 50 insertions(+), 11 deletions(-) > > > > diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c > > index 3981ff7..e842d74 100644 > > --- a/fs/ext4/extents_status.c > > +++ b/fs/ext4/extents_status.c > > @@ -660,7 +660,7 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, > > newes.es_len = len; > > ext4_es_store_pblock(&newes, pblk); > > ext4_es_store_status(&newes, status); > > - trace_ext4_es_insert_extent(inode, &newes); > > + trace_ext4_es_insert_extent_enter(inode, &newes); > > > > ext4_es_insert_extent_check(inode, &newes); > > > > @@ -678,6 +678,7 @@ retry: > > > > error: > > write_unlock(&EXT4_I(inode)->i_es_lock); > > + trace_ext4_es_insert_extent_exit(inode, &newes); > > > > ext4_es_print_tree(inode); > > > > @@ -701,7 +702,7 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, > > newes.es_len = len; > > ext4_es_store_pblock(&newes, pblk); > > ext4_es_store_status(&newes, status); > > - trace_ext4_es_cache_extent(inode, &newes); > > + trace_ext4_es_cache_extent_enter(inode, &newes); > > > > if (!len) > > return; > > @@ -714,6 +715,7 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, > > if (!es || es->es_lblk > end) > > __es_insert_extent(inode, &newes); > > write_unlock(&EXT4_I(inode)->i_es_lock); > > + trace_ext4_es_cache_extent_exit(inode, &newes); > > } > > > > /* > > @@ -887,7 +889,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, > > ext4_lblk_t end; > > int err = 0; > > > > - trace_ext4_es_remove_extent(inode, lblk, len); > > + trace_ext4_es_remove_extent_enter(inode, lblk, len); > > es_debug("remove [%u/%u) from extent status tree of inode %lu\n", > > lblk, len, inode->i_ino); > > > > @@ -901,6 +903,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, > > err = __es_remove_extent(inode, lblk, end); > > write_unlock(&EXT4_I(inode)->i_es_lock); > > ext4_es_print_tree(inode); > > + trace_ext4_es_remove_extent_exit(inode, lblk, len); > > return err; > > } > > > > @@ -1017,7 +1020,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink, > > > > sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker); > > nr = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); > > - trace_ext4_es_shrink_enter(sbi->s_sb, sc->nr_to_scan, nr); > > + trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr); > > return nr; > > } > > > > @@ -1030,14 +1033,14 @@ static unsigned long ext4_es_scan(struct shrinker *shrink, > > int ret, nr_shrunk; > > > > ret = percpu_counter_read_positive(&sbi->s_extent_cache_cnt); > > - trace_ext4_es_shrink_enter(sbi->s_sb, nr_to_scan, ret); > > + trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret); > > > > if (!nr_to_scan) > > return ret; > > > > nr_shrunk = __ext4_es_shrink(sbi, nr_to_scan, NULL); > > > > - trace_ext4_es_shrink_exit(sbi->s_sb, nr_shrunk, ret); > > + trace_ext4_es_shrink_scan_exit(sbi->s_sb, nr_shrunk, ret); > > return nr_shrunk; > > } > > > > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h > > index 197d312..8b3ff82 100644 > > --- a/include/trace/events/ext4.h > > +++ b/include/trace/events/ext4.h > > @@ -2221,19 +2221,31 @@ DECLARE_EVENT_CLASS(ext4__es_extent, > > __entry->pblk, show_extent_status(__entry->status)) > > ); > > > > -DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent, > > +DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent_enter, > > TP_PROTO(struct inode *inode, struct extent_status *es), > > > > TP_ARGS(inode, es) > > ); > > > > -DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent, > > +DEFINE_EVENT(ext4__es_extent, ext4_es_insert_extent_exit, > > TP_PROTO(struct inode *inode, struct extent_status *es), > > > > TP_ARGS(inode, es) > > ); > > > > -TRACE_EVENT(ext4_es_remove_extent, > > +DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent_enter, > > + TP_PROTO(struct inode *inode, struct extent_status *es), > > + > > + TP_ARGS(inode, es) > > +); > > + > > +DEFINE_EVENT(ext4__es_extent, ext4_es_cache_extent_exit, > > + TP_PROTO(struct inode *inode, struct extent_status *es), > > + > > + TP_ARGS(inode, es) > > +); > > + > > +DECLARE_EVENT_CLASS(ext4__es_remove_extent, > > TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), > > > > TP_ARGS(inode, lblk, len), > > @@ -2258,6 +2270,18 @@ TRACE_EVENT(ext4_es_remove_extent, > > __entry->lblk, __entry->len) > > ); > > > > +DEFINE_EVENT(ext4__es_remove_extent, ext4_es_remove_extent_enter, > > + TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), > > + > > + TP_ARGS(inode, lblk, len) > > +); > > + > > +DEFINE_EVENT(ext4__es_remove_extent, ext4_es_remove_extent_exit, > > + TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len), > > + > > + TP_ARGS(inode, lblk, len) > > +); > > + > > TRACE_EVENT(ext4_es_find_delayed_extent_range_enter, > > TP_PROTO(struct inode *inode, ext4_lblk_t lblk), > > > > @@ -2366,7 +2390,7 @@ TRACE_EVENT(ext4_es_lookup_extent_exit, > > show_extent_status(__entry->found ? __entry->status : 0)) > > ); > > > > -TRACE_EVENT(ext4_es_shrink_enter, > > +DECLARE_EVENT_CLASS(ext4__es_shrink_enter, > > TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > > > > TP_ARGS(sb, nr_to_scan, cache_cnt), > > @@ -2388,7 +2412,19 @@ TRACE_EVENT(ext4_es_shrink_enter, > > __entry->nr_to_scan, __entry->cache_cnt) > > ); > > > > -TRACE_EVENT(ext4_es_shrink_exit, > > +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_count, > > + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > > + > > + TP_ARGS(sb, nr_to_scan, cache_cnt) > > +); > > + > > +DEFINE_EVENT(ext4__es_shrink_enter, ext4_es_shrink_scan_enter, > > + TP_PROTO(struct super_block *sb, int nr_to_scan, int cache_cnt), > > + > > + TP_ARGS(sb, nr_to_scan, cache_cnt) > > +); > > + > > +TRACE_EVENT(ext4_es_shrink_scan_exit, > > TP_PROTO(struct super_block *sb, int shrunk_nr, int cache_cnt), > > > > TP_ARGS(sb, shrunk_nr, cache_cnt), > > -- > > 1.7.9.7 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-01-14 13:24 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-14 8:06 [RFC PATCH 0/2] add some stats in es shrinker and do some cleanups in tracepoint Zheng Liu 2014-01-14 8:06 ` [RFC PATCH 1/2] ext4: improve extents status tree trace point Zheng Liu 2014-01-14 13:14 ` Jan Kara 2014-01-14 8:06 ` [RFC PATCH 2/2] ext4: track extent status tree shrinker delay statictics Zheng Liu 2014-01-14 13:24 ` Jan Kara -- strict thread matches above, loose matches on Subject: below -- 2013-12-20 10:42 [RFC PATCH 0/2] ext4: extents status tree shrinker improvement Zheng Liu 2013-12-20 10:42 ` [RFC PATCH 1/2] ext4: improve extents status tree trace point Zheng Liu 2013-12-23 8:39 ` Jan Kara 2013-12-25 3:23 ` Zheng Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).