* [PATCH, RFC -V2 0/4] mballoc patches for ext4
@ 2009-08-10 3:23 Theodore Ts'o
2009-08-10 3:23 ` [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging Theodore Ts'o
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Theodore Ts'o @ 2009-08-10 3:23 UTC (permalink / raw)
To: linux-ext4; +Cc: Andreas Dilger, Alex Tomas, Theodore Ts'o
This patch series adds better debugging support for mballoc, but the
main goal was to improve how small files are allocated.
These patches could use some testing and benchmarking; I suspect we will
a slight increase the CPU required to read and write small files, but
hopefully it won't be significant enough to be significantly
noticeable. More seriously, the hueristics that try to detect lock
contention and so we go back to using per-cpu group preallocation
extents may need some tuning. We may also want to change the group
preallocation code so that instead of requiring an aligned extent of 512
blocks which is completely unused, that a partially used extent can also
be used for group preallocation.
I used the attached test script, named test-allocate, to demonstrate the
benefits of the last two patches in this patch series. (The first two
simply add better debugging and make the flags field in mb_history
easier to understand.) Without the last two patches applied, the
results of a (bug-fixed) e2freefrag looks like this:
Device: /dev/sdc1
Blocksize: 1024 bytes
Total blocks: 5237156
Free blocks: 5117184 (97.7%)
Min. free extent: 5 KB
Max. free extent: 128992 KB
Avg. free extent: 85286 KB
HISTOGRAM OF FREE EXTENT SIZES:
Chunk Size Range : Free chunks Free Blocks Percent
4K... 8K- : 1 5 0.00%
8K... 16K- : 1 11 0.00%
16K... 32K- : 2 44 0.00%
128K... 256K- : 1 235 0.00%
256K... 512K- : 1 407 0.01%
4M... 8M- : 5 30268 0.59%
8M... 16M- : 5 79723 1.56%
16M... 32M- : 2 46795 0.91%
32M... 64M- : 3 153014 2.99%
64M... 128M- : 39 4806682 93.93%
With the full patch series applied the e2freefrag output looks like
this:
Device: /dev/sdc1
Blocksize: 1024 bytes
Total blocks: 5237156
Free blocks: 5117184 (97.7%)
Min. free extent: 1 KB
Max. free extent: 128992 KB
Avg. free extent: 88227 KB
HISTOGRAM OF FREE EXTENTS SIZES:
Chunk Size Range : Free chunks Free Blocks Percent
1K... 2K- : 2 2 0.00%
2K... 4K- : 1 3 0.00%
4K... 8K- : 1 5 0.00%
4M... 8M- : 5 30268 0.59%
8M... 16M- : 5 80415 1.57%
16M... 32M- : 2 46795 0.91%
32M... 64M- : 3 153014 2.99%
64M... 128M- : 39 4806682 93.93%
Compare the histogram of the sub-megabyte free chunks.
Here is the before output of mb_history:
pid inode original goal result found grps cr flags merge tail broken
1920 513 0/0/1@0 0/0/1@0 0/2371/1@0 1 1 1 0x0000 0 0
398 12 1/0/3@0 0/0/512@0 1/512/512@0 1 1 0 0x04a0 0 0
398 13 1/0/5@0 1/515/5@0
398 14 1/0/7@0 1/520/7@0
398 15 1/0/11@0 1/527/11@0
398 16 1/0/13@0 1/538/13@0
398 17 1/0/15@0 1/551/15@0
398 19 1/0/5@0 1/566/5@0
398 20 1/0/7@0 1/571/7@0
398 21 1/0/11@0 1/578/11@0
398 22 1/0/13@0 1/589/13@0
398 23 1/0/15@0 1/602/15@0
1933 18 1/0/3@0 1/512/512@0 1/1024/512@0 1 1 0 0x04a0 512 1024
398 13 1/0/2@0 1/1027/2@0
398 15 1/0/3@0 1/1029/3@0
398 17 1/0/5@0 1/1032/5@0
398 19 1/0/6@0 1/1037/6@0
398 21 1/0/9@0 1/1043/9@0
398 22 1/0/6@0 1/1052/6@0
398 24 1/0/2@0 1/1058/2@0
398 25 1/0/3@0 1/1060/3@0
398 26 1/0/5@0 1/1063/5@0
398 27 1/0/6@0 1/1068/6@0
398 28 1/0/9@0 1/1074/9@0
398 29 1/0/6@0 1/1083/6@0
.... and here is the after:
pid inode original goal result found grps cr flags merge tail broken
1825 513 0/0/1@0 0/0/1@0 0/2371/1@0 1 1 1 0x0000 0 0
397 12 1/0/3@0 1/0/3@0 1/277/3@0 1 1 1 0x0460 0 0
397 13 1/0/5@0 1/0/5@0 1/280/5@0 9 1 1 0x0460 5 8
397 14 1/0/7@0 1/0/7@0 1/285/7@0 8 1 1 0x0460 4 32
397 15 1/0/11@0 1/0/11@0 1/292/11@0 9 1 1 0x0460 7 8
397 16 1/0/13@0 1/0/13@0 1/303/13@0 8 1 1 0x0460 12 16
397 17 1/0/15@0 1/0/15@0 1/316/15@0 7 1 1 0x0460 11 64
397 18 1/0/3@0 1/0/3@0 1/331/3@0 9 1 1 0x0460 2 4
397 19 1/0/5@0 1/0/5@0 1/334/5@0 8 1 1 0x0460 3 16
397 20 1/0/7@0 1/0/7@0 1/339/7@0 8 1 1 0x0460 2 8
397 21 1/0/11@0 1/0/11@0 1/346/11@0 7 1 1 0x0460 5 32
397 22 1/0/13@0 1/0/13@0 1/357/13@0 7 1 1 0x0460 2 16
397 23 1/0/15@0 1/0/15@0 1/370/15@0 6 1 1 0x0460 1 128
1853 13 1/0/2@0 1/0/2@0 1/300/2@0 1 1 0 0x0460 0 0
1853 15 1/0/3@0 1/0/3@0 1/328/3@0 8 1 1 0x0460 0 0
1853 17 1/0/5@0 1/0/5@0 1/280/5@0 1 1 1 0x0460 0 0
1853 19 1/0/6@0 1/0/6@0 1/346/6@0 5 1 1 0x0460 0 0
1853 21 1/0/9@0 1/0/9@0 1/316/9@0 11 1 1 0x0460 5 8
1853 22 1/0/6@0 1/0/6@0 1/385/6@0 11 1 1 0x0460 3 4
1853 24 1/0/2@0 1/0/2@0 1/326/2@0 1 1 0 0x0460 0 0
1853 25 1/0/3@0 1/0/3@0 1/292/3@0 11 1 1 0x0460 3 4
1853 26 1/0/5@0 1/0/5@0 1/295/5@0 1 1 1 0x0460 0 0
1853 27 1/0/6@0 1/0/6@0 1/391/6@0 11 1 1 0x0460 5 8
1853 28 1/0/9@0 1/0/9@0 1/352/9@0 11 1 1 0x0460 9 16
1853 29 1/0/6@0 1/0/6@0 1/361/6@0 11 1 1 0x0460 3 4
- Ted
------------------------ begin test-allocate
#!/bin/bash
function gen_file()
{
dd if=/dev/zero of=$1 bs=1k count=$2
}
mkdir t
cd t
gen_file a 3
gen_file b 5
gen_file c 7
gen_file d 11
gen_file e 13
gen_file f 15
gen_file g 3
gen_file h 5
gen_file i 7
gen_file j 11
gen_file k 13
gen_file l 15
sync
rm b d f h j k
sync
gen_file m 2
gen_file n 3
gen_file o 5
gen_file p 6
gen_file q 9
gen_file r 6
gen_file s 2
gen_file t 3
gen_file u 5
gen_file v 6
gen_file w 9
gen_file x 6
sync
-------------------------------------------------
Theodore Ts'o (4):
ext4: Add configurable run-time mballoc debugging
ext4: Display the mballoc flags in mb_history in hex instead of
decimal
ext4: Fix bugs in mballoc's stream allocation mode
ext4: Avoid group preallocation for closed files
fs/ext4/Kconfig | 9 ++++
fs/ext4/ext4.h | 46 +++++++++++++++-----
fs/ext4/mballoc.c | 116 ++++++++++++++++++++++++++++++++++++-----------------
fs/ext4/mballoc.h | 16 +++++--
4 files changed, 134 insertions(+), 53 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging 2009-08-10 3:23 [PATCH, RFC -V2 0/4] mballoc patches for ext4 Theodore Ts'o @ 2009-08-10 3:23 ` Theodore Ts'o 2009-08-10 3:42 ` Eric Sandeen 2009-08-11 18:15 ` Xiang Wang 2009-08-10 3:23 ` [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal Theodore Ts'o ` (2 subsequent siblings) 3 siblings, 2 replies; 15+ messages in thread From: Theodore Ts'o @ 2009-08-10 3:23 UTC (permalink / raw) To: linux-ext4; +Cc: Andreas Dilger, Alex Tomas, Theodore Ts'o Allow mballoc debugging to be enabled at run-time instead of just at compile time. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- fs/ext4/Kconfig | 9 ++++++ fs/ext4/mballoc.c | 81 ++++++++++++++++++++++++++++++++++++++-------------- fs/ext4/mballoc.h | 16 ++++++++-- 3 files changed, 80 insertions(+), 26 deletions(-) diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig index 1523046..d5c0ea2 100644 --- a/fs/ext4/Kconfig +++ b/fs/ext4/Kconfig @@ -77,3 +77,12 @@ config EXT4_FS_SECURITY If you are not using a security module that requires using extended attributes for file security labels, say N. + +config EXT4_DEBUG + bool "EXT4 debugging support" + depends on EXT4_FS + help + Enables run-time debugging support for the ext4 filesystem. + + If you select Y here, then you will be able to turn on debugging + with a command such as "echo 1 > /sys/kernel/debug/ext4/mballoc-debug" diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 68cde59..2c81240 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -22,6 +22,7 @@ */ #include "mballoc.h" +#include <linux/debugfs.h> #include <trace/events/ext4.h> /* @@ -743,7 +744,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) char *data; char *bitmap; - mb_debug("init page %lu\n", page->index); + mb_debug(1, "init page %lu\n", page->index); inode = page->mapping->host; sb = inode->i_sb; @@ -822,7 +823,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) set_bitmap_uptodate(bh[i]); bh[i]->b_end_io = end_buffer_read_sync; submit_bh(READ, bh[i]); - mb_debug("read bitmap for group %u\n", first_group + i); + mb_debug(1, "read bitmap for group %u\n", first_group + i); } /* wait for I/O completion */ @@ -862,7 +863,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) if ((first_block + i) & 1) { /* this is block of buddy */ BUG_ON(incore == NULL); - mb_debug("put buddy for group %u in page %lu/%x\n", + mb_debug(1, "put buddy for group %u in page %lu/%x\n", group, page->index, i * blocksize); grinfo = ext4_get_group_info(sb, group); grinfo->bb_fragments = 0; @@ -878,7 +879,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) } else { /* this is block of bitmap */ BUG_ON(incore != NULL); - mb_debug("put bitmap for group %u in page %lu/%x\n", + mb_debug(1, "put bitmap for group %u in page %lu/%x\n", group, page->index, i * blocksize); /* see comments in ext4_mb_put_pa() */ @@ -922,7 +923,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, struct ext4_sb_info *sbi = EXT4_SB(sb); struct inode *inode = sbi->s_buddy_cache; - mb_debug("load group %u\n", group); + mb_debug(1, "load group %u\n", group); blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize; grp = ext4_get_group_info(sb, group); @@ -1851,7 +1852,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) struct inode *inode = sbi->s_buddy_cache; struct page *page = NULL, *bitmap_page = NULL; - mb_debug("init group %u\n", group); + mb_debug(1, "init group %u\n", group); blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize; this_grp = ext4_get_group_info(sb, group); /* @@ -2739,7 +2740,7 @@ static void ext4_mb_cleanup_pa(struct ext4_group_info *grp) kmem_cache_free(ext4_pspace_cachep, pa); } if (count) - mb_debug("mballoc: %u PAs left\n", count); + mb_debug(1, "mballoc: %u PAs left\n", count); } @@ -2820,7 +2821,7 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn) list_for_each_safe(l, ltmp, &txn->t_private_list) { entry = list_entry(l, struct ext4_free_data, list); - mb_debug("gonna free %u blocks in group %u (0x%p):", + mb_debug(1, "gonna free %u blocks in group %u (0x%p):", entry->count, entry->group, entry); err = ext4_mb_load_buddy(sb, entry->group, &e4b); @@ -2855,9 +2856,43 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn) ext4_mb_release_desc(&e4b); } - mb_debug("freed %u blocks in %u structures\n", count, count2); + mb_debug(1, "freed %u blocks in %u structures\n", count, count2); } +#ifdef CONFIG_EXT4_DEBUG +u8 mb_enable_debug __read_mostly; + +static struct dentry *debugfs_dir; +static struct dentry *debugfs_debug; + +static void __init ext4_create_debugfs_entry(void) +{ + debugfs_dir = debugfs_create_dir("ext4", NULL); + if (debugfs_dir) + debugfs_debug = debugfs_create_u8("mballoc-debug", + S_IRUGO | S_IWUSR, + debugfs_dir, + &mb_enable_debug); +} + +static void __exit ext4_remove_debugfs_entry(void) +{ + debugfs_remove(debugfs_debug); + debugfs_remove(debugfs_dir); +} + +#else + +static void __init ext4_create_debugfs_entry(void) +{ +} + +static void __exit ext4_remove_debugfs_entry(void) +{ +} + +#endif + int __init init_ext4_mballoc(void) { ext4_pspace_cachep = @@ -2885,6 +2920,7 @@ int __init init_ext4_mballoc(void) kmem_cache_destroy(ext4_ac_cachep); return -ENOMEM; } + ext4_create_debugfs_entry(); return 0; } @@ -2898,6 +2934,7 @@ void exit_ext4_mballoc(void) kmem_cache_destroy(ext4_pspace_cachep); kmem_cache_destroy(ext4_ac_cachep); kmem_cache_destroy(ext4_free_ext_cachep); + ext4_remove_debugfs_entry(); } @@ -3042,7 +3079,7 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac) ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe; else ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc; - mb_debug("#%u: goal %u blocks for locality group\n", + mb_debug(1, "#%u: goal %u blocks for locality group\n", current->pid, ac->ac_g_ex.fe_len); } @@ -3232,7 +3269,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL; } - mb_debug("goal: %u(was %u) blocks at %u\n", (unsigned) size, + mb_debug(1, "goal: %u(was %u) blocks at %u\n", (unsigned) size, (unsigned) orig_size, (unsigned) start); } @@ -3281,7 +3318,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac, BUG_ON(pa->pa_free < len); pa->pa_free -= len; - mb_debug("use %llu/%u from inode pa %p\n", start, len, pa); + mb_debug(1, "use %llu/%u from inode pa %p\n", start, len, pa); } /* @@ -3305,7 +3342,7 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac, * in on-disk bitmap -- see ext4_mb_release_context() * Other CPUs are prevented from allocating from this pa by lg_mutex */ - mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa); + mb_debug(1, "use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa); } /* @@ -3484,7 +3521,7 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, preallocated += len; count++; } - mb_debug("prellocated %u for group %u\n", preallocated, group); + mb_debug(1, "prellocated %u for group %u\n", preallocated, group); } static void ext4_mb_pa_callback(struct rcu_head *head) @@ -3619,7 +3656,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa->pa_deleted = 0; pa->pa_type = MB_INODE_PA; - mb_debug("new inode pa %p: %llu/%u for %u\n", pa, + mb_debug(1, "new inode pa %p: %llu/%u for %u\n", pa, pa->pa_pstart, pa->pa_len, pa->pa_lstart); trace_ext4_mb_new_inode_pa(ac, pa); @@ -3679,7 +3716,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac) pa->pa_deleted = 0; pa->pa_type = MB_GROUP_PA; - mb_debug("new group pa %p: %llu/%u for %u\n", pa, + mb_debug(1, "new group pa %p: %llu/%u for %u\n", pa, pa->pa_pstart, pa->pa_len, pa->pa_lstart); trace_ext4_mb_new_group_pa(ac, pa); @@ -3758,7 +3795,7 @@ ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh, next = mb_find_next_bit(bitmap_bh->b_data, end, bit); start = group * EXT4_BLOCKS_PER_GROUP(sb) + bit + le32_to_cpu(sbi->s_es->s_first_data_block); - mb_debug(" free preallocated %u/%u in group %u\n", + mb_debug(1, " free preallocated %u/%u in group %u\n", (unsigned) start, (unsigned) next - bit, (unsigned) group); free += next - bit; @@ -3849,7 +3886,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, int busy = 0; int free = 0; - mb_debug("discard preallocation for group %u\n", group); + mb_debug(1, "discard preallocation for group %u\n", group); if (list_empty(&grp->bb_prealloc_list)) return 0; @@ -3973,7 +4010,7 @@ void ext4_discard_preallocations(struct inode *inode) return; } - mb_debug("discard preallocation for inode %lu\n", inode->i_ino); + mb_debug(1, "discard preallocation for inode %lu\n", inode->i_ino); trace_ext4_discard_preallocations(inode); INIT_LIST_HEAD(&list); @@ -4078,7 +4115,7 @@ static void ext4_mb_return_to_preallocation(struct inode *inode, { BUG_ON(!list_empty(&EXT4_I(inode)->i_prealloc_list)); } -#ifdef MB_DEBUG +#ifdef CONFIG_EXT4_DEBUG static void ext4_mb_show_ac(struct ext4_allocation_context *ac) { struct super_block *sb = ac->ac_sb; @@ -4227,7 +4264,7 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac, * locality group. this is a policy, actually */ ext4_mb_group_or_file(ac); - mb_debug("init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, " + mb_debug(1, "init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, " "left: %u/%u, right %u/%u to %swritable\n", (unsigned) ar->len, (unsigned) ar->logical, (unsigned) ar->goal, ac->ac_flags, ac->ac_2order, @@ -4249,7 +4286,7 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb, struct ext4_prealloc_space *pa, *tmp; struct ext4_allocation_context *ac; - mb_debug("discard locality group preallocation\n"); + mb_debug(1, "discard locality group preallocation\n"); INIT_LIST_HEAD(&discard_list); ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS); diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index c96bb19..28abb02 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -37,11 +37,19 @@ /* */ -#define MB_DEBUG__ -#ifdef MB_DEBUG -#define mb_debug(fmt, a...) printk(fmt, ##a) +#ifdef CONFIG_EXT4_DEBUG +extern u8 mb_enable_debug; + +#define mb_debug(n, fmt, a...) \ + do { \ + if ((n) <= mb_enable_debug) { \ + printk (KERN_DEBUG "(%s, %d): %s: ", \ + __FILE__, __LINE__, __func__); \ + printk (fmt, ## a); \ + } \ + } while (0) #else -#define mb_debug(fmt, a...) +#define mb_debug(n, fmt, a...) #endif /* -- 1.6.3.2.1.gb9f7d.dirty ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging 2009-08-10 3:23 ` [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging Theodore Ts'o @ 2009-08-10 3:42 ` Eric Sandeen 2009-08-10 20:23 ` Theodore Tso 2009-08-11 18:15 ` Xiang Wang 1 sibling, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2009-08-10 3:42 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4, Andreas Dilger, Alex Tomas Theodore Ts'o wrote: > Allow mballoc debugging to be enabled at run-time instead of just at > compile time. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > fs/ext4/Kconfig | 9 ++++++ > fs/ext4/mballoc.c | 81 ++++++++++++++++++++++++++++++++++++++-------------- > fs/ext4/mballoc.h | 16 ++++++++-- > 3 files changed, 80 insertions(+), 26 deletions(-) This looks fine to me, though is there any reason to add the debug verbosity level at this point, when it's all just "1?" If you're going to do that I suppose I'd suggest some #defines that give an idea of the expected range... at least low/medium/high... will our debug go to 11? ;) -Eric > diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig > index 1523046..d5c0ea2 100644 > --- a/fs/ext4/Kconfig > +++ b/fs/ext4/Kconfig > @@ -77,3 +77,12 @@ config EXT4_FS_SECURITY > > If you are not using a security module that requires using > extended attributes for file security labels, say N. > + > +config EXT4_DEBUG > + bool "EXT4 debugging support" > + depends on EXT4_FS > + help > + Enables run-time debugging support for the ext4 filesystem. > + > + If you select Y here, then you will be able to turn on debugging > + with a command such as "echo 1 > /sys/kernel/debug/ext4/mballoc-debug" > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 68cde59..2c81240 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -22,6 +22,7 @@ > */ > > #include "mballoc.h" > +#include <linux/debugfs.h> > #include <trace/events/ext4.h> > > /* > @@ -743,7 +744,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) > char *data; > char *bitmap; > > - mb_debug("init page %lu\n", page->index); > + mb_debug(1, "init page %lu\n", page->index); > > inode = page->mapping->host; > sb = inode->i_sb; > @@ -822,7 +823,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) > set_bitmap_uptodate(bh[i]); > bh[i]->b_end_io = end_buffer_read_sync; > submit_bh(READ, bh[i]); > - mb_debug("read bitmap for group %u\n", first_group + i); > + mb_debug(1, "read bitmap for group %u\n", first_group + i); > } > > /* wait for I/O completion */ > @@ -862,7 +863,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) > if ((first_block + i) & 1) { > /* this is block of buddy */ > BUG_ON(incore == NULL); > - mb_debug("put buddy for group %u in page %lu/%x\n", > + mb_debug(1, "put buddy for group %u in page %lu/%x\n", > group, page->index, i * blocksize); > grinfo = ext4_get_group_info(sb, group); > grinfo->bb_fragments = 0; > @@ -878,7 +879,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) > } else { > /* this is block of bitmap */ > BUG_ON(incore != NULL); > - mb_debug("put bitmap for group %u in page %lu/%x\n", > + mb_debug(1, "put bitmap for group %u in page %lu/%x\n", > group, page->index, i * blocksize); > > /* see comments in ext4_mb_put_pa() */ > @@ -922,7 +923,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, > struct ext4_sb_info *sbi = EXT4_SB(sb); > struct inode *inode = sbi->s_buddy_cache; > > - mb_debug("load group %u\n", group); > + mb_debug(1, "load group %u\n", group); > > blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize; > grp = ext4_get_group_info(sb, group); > @@ -1851,7 +1852,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) > struct inode *inode = sbi->s_buddy_cache; > struct page *page = NULL, *bitmap_page = NULL; > > - mb_debug("init group %u\n", group); > + mb_debug(1, "init group %u\n", group); > blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize; > this_grp = ext4_get_group_info(sb, group); > /* > @@ -2739,7 +2740,7 @@ static void ext4_mb_cleanup_pa(struct ext4_group_info *grp) > kmem_cache_free(ext4_pspace_cachep, pa); > } > if (count) > - mb_debug("mballoc: %u PAs left\n", count); > + mb_debug(1, "mballoc: %u PAs left\n", count); > > } > > @@ -2820,7 +2821,7 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn) > list_for_each_safe(l, ltmp, &txn->t_private_list) { > entry = list_entry(l, struct ext4_free_data, list); > > - mb_debug("gonna free %u blocks in group %u (0x%p):", > + mb_debug(1, "gonna free %u blocks in group %u (0x%p):", > entry->count, entry->group, entry); > > err = ext4_mb_load_buddy(sb, entry->group, &e4b); > @@ -2855,9 +2856,43 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn) > ext4_mb_release_desc(&e4b); > } > > - mb_debug("freed %u blocks in %u structures\n", count, count2); > + mb_debug(1, "freed %u blocks in %u structures\n", count, count2); > } > > +#ifdef CONFIG_EXT4_DEBUG > +u8 mb_enable_debug __read_mostly; > + > +static struct dentry *debugfs_dir; > +static struct dentry *debugfs_debug; > + > +static void __init ext4_create_debugfs_entry(void) > +{ > + debugfs_dir = debugfs_create_dir("ext4", NULL); > + if (debugfs_dir) > + debugfs_debug = debugfs_create_u8("mballoc-debug", > + S_IRUGO | S_IWUSR, > + debugfs_dir, > + &mb_enable_debug); > +} > + > +static void __exit ext4_remove_debugfs_entry(void) > +{ > + debugfs_remove(debugfs_debug); > + debugfs_remove(debugfs_dir); > +} > + > +#else > + > +static void __init ext4_create_debugfs_entry(void) > +{ > +} > + > +static void __exit ext4_remove_debugfs_entry(void) > +{ > +} > + > +#endif > + > int __init init_ext4_mballoc(void) > { > ext4_pspace_cachep = > @@ -2885,6 +2920,7 @@ int __init init_ext4_mballoc(void) > kmem_cache_destroy(ext4_ac_cachep); > return -ENOMEM; > } > + ext4_create_debugfs_entry(); > return 0; > } > > @@ -2898,6 +2934,7 @@ void exit_ext4_mballoc(void) > kmem_cache_destroy(ext4_pspace_cachep); > kmem_cache_destroy(ext4_ac_cachep); > kmem_cache_destroy(ext4_free_ext_cachep); > + ext4_remove_debugfs_entry(); > } > > > @@ -3042,7 +3079,7 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac) > ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe; > else > ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc; > - mb_debug("#%u: goal %u blocks for locality group\n", > + mb_debug(1, "#%u: goal %u blocks for locality group\n", > current->pid, ac->ac_g_ex.fe_len); > } > > @@ -3232,7 +3269,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, > ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL; > } > > - mb_debug("goal: %u(was %u) blocks at %u\n", (unsigned) size, > + mb_debug(1, "goal: %u(was %u) blocks at %u\n", (unsigned) size, > (unsigned) orig_size, (unsigned) start); > } > > @@ -3281,7 +3318,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac, > BUG_ON(pa->pa_free < len); > pa->pa_free -= len; > > - mb_debug("use %llu/%u from inode pa %p\n", start, len, pa); > + mb_debug(1, "use %llu/%u from inode pa %p\n", start, len, pa); > } > > /* > @@ -3305,7 +3342,7 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac, > * in on-disk bitmap -- see ext4_mb_release_context() > * Other CPUs are prevented from allocating from this pa by lg_mutex > */ > - mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa); > + mb_debug(1, "use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa); > } > > /* > @@ -3484,7 +3521,7 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, > preallocated += len; > count++; > } > - mb_debug("prellocated %u for group %u\n", preallocated, group); > + mb_debug(1, "prellocated %u for group %u\n", preallocated, group); > } > > static void ext4_mb_pa_callback(struct rcu_head *head) > @@ -3619,7 +3656,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) > pa->pa_deleted = 0; > pa->pa_type = MB_INODE_PA; > > - mb_debug("new inode pa %p: %llu/%u for %u\n", pa, > + mb_debug(1, "new inode pa %p: %llu/%u for %u\n", pa, > pa->pa_pstart, pa->pa_len, pa->pa_lstart); > trace_ext4_mb_new_inode_pa(ac, pa); > > @@ -3679,7 +3716,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac) > pa->pa_deleted = 0; > pa->pa_type = MB_GROUP_PA; > > - mb_debug("new group pa %p: %llu/%u for %u\n", pa, > + mb_debug(1, "new group pa %p: %llu/%u for %u\n", pa, > pa->pa_pstart, pa->pa_len, pa->pa_lstart); > trace_ext4_mb_new_group_pa(ac, pa); > > @@ -3758,7 +3795,7 @@ ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh, > next = mb_find_next_bit(bitmap_bh->b_data, end, bit); > start = group * EXT4_BLOCKS_PER_GROUP(sb) + bit + > le32_to_cpu(sbi->s_es->s_first_data_block); > - mb_debug(" free preallocated %u/%u in group %u\n", > + mb_debug(1, " free preallocated %u/%u in group %u\n", > (unsigned) start, (unsigned) next - bit, > (unsigned) group); > free += next - bit; > @@ -3849,7 +3886,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, > int busy = 0; > int free = 0; > > - mb_debug("discard preallocation for group %u\n", group); > + mb_debug(1, "discard preallocation for group %u\n", group); > > if (list_empty(&grp->bb_prealloc_list)) > return 0; > @@ -3973,7 +4010,7 @@ void ext4_discard_preallocations(struct inode *inode) > return; > } > > - mb_debug("discard preallocation for inode %lu\n", inode->i_ino); > + mb_debug(1, "discard preallocation for inode %lu\n", inode->i_ino); > trace_ext4_discard_preallocations(inode); > > INIT_LIST_HEAD(&list); > @@ -4078,7 +4115,7 @@ static void ext4_mb_return_to_preallocation(struct inode *inode, > { > BUG_ON(!list_empty(&EXT4_I(inode)->i_prealloc_list)); > } > -#ifdef MB_DEBUG > +#ifdef CONFIG_EXT4_DEBUG > static void ext4_mb_show_ac(struct ext4_allocation_context *ac) > { > struct super_block *sb = ac->ac_sb; > @@ -4227,7 +4264,7 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac, > * locality group. this is a policy, actually */ > ext4_mb_group_or_file(ac); > > - mb_debug("init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, " > + mb_debug(1, "init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, " > "left: %u/%u, right %u/%u to %swritable\n", > (unsigned) ar->len, (unsigned) ar->logical, > (unsigned) ar->goal, ac->ac_flags, ac->ac_2order, > @@ -4249,7 +4286,7 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb, > struct ext4_prealloc_space *pa, *tmp; > struct ext4_allocation_context *ac; > > - mb_debug("discard locality group preallocation\n"); > + mb_debug(1, "discard locality group preallocation\n"); > > INIT_LIST_HEAD(&discard_list); > ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS); > diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h > index c96bb19..28abb02 100644 > --- a/fs/ext4/mballoc.h > +++ b/fs/ext4/mballoc.h > @@ -37,11 +37,19 @@ > > /* > */ > -#define MB_DEBUG__ > -#ifdef MB_DEBUG > -#define mb_debug(fmt, a...) printk(fmt, ##a) > +#ifdef CONFIG_EXT4_DEBUG > +extern u8 mb_enable_debug; > + > +#define mb_debug(n, fmt, a...) \ > + do { \ > + if ((n) <= mb_enable_debug) { \ > + printk (KERN_DEBUG "(%s, %d): %s: ", \ > + __FILE__, __LINE__, __func__); \ > + printk (fmt, ## a); \ > + } \ > + } while (0) > #else > -#define mb_debug(fmt, a...) > +#define mb_debug(n, fmt, a...) > #endif > > /* ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging 2009-08-10 3:42 ` Eric Sandeen @ 2009-08-10 20:23 ` Theodore Tso 0 siblings, 0 replies; 15+ messages in thread From: Theodore Tso @ 2009-08-10 20:23 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-ext4, Andreas Dilger, Alex Tomas On Sun, Aug 09, 2009 at 10:42:17PM -0500, Eric Sandeen wrote: > Theodore Ts'o wrote: > > Allow mballoc debugging to be enabled at run-time instead of just at > > compile time. > > > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > > --- > > fs/ext4/Kconfig | 9 ++++++ > > fs/ext4/mballoc.c | 81 ++++++++++++++++++++++++++++++++++++++-------------- > > fs/ext4/mballoc.h | 16 ++++++++-- > > 3 files changed, 80 insertions(+), 26 deletions(-) > > This looks fine to me, though is there any reason to add the debug > verbosity level at this point, when it's all just "1?" What I'm thinking about doing is to use a bitmask instead of a verbosity level. That way it will be possible to selectively enable some debugging print messages and not others. In the long run the right answer is to use ftrace instead, but this is faster and more convenient when doing surgery/development on the mballoc code. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging 2009-08-10 3:23 ` [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging Theodore Ts'o 2009-08-10 3:42 ` Eric Sandeen @ 2009-08-11 18:15 ` Xiang Wang 2009-08-11 18:53 ` Theodore Tso 1 sibling, 1 reply; 15+ messages in thread From: Xiang Wang @ 2009-08-11 18:15 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4, Andreas Dilger, Alex Tomas, Curt Wohlgemuth Hi Ted, I tried to apply this patch to our kernel, but I encountered some problems in building the kernel: FATAL: fs/ext4/ext4.o(.text+0x24f2d): Section mismatch in reference from the function exit_ext4_mballoc() to the function .exit.text:ext4_remove_debugfs_entry() Looking at the code, the problem seems to be that, exit_ext4_mballoc calls ext4_remove_debugfs_entry. And ext4_remove_debugfs_entry has the "__exit" annotation while exit_ext4_mballoc does not. I tried removing the "__exit" annotation from ext4_remove_debugfs_entry and it builds well. I am wondering whether you have seen any similar problems. Thanks, Xiang On Sun, Aug 9, 2009 at 8:23 PM, Theodore Ts'o<tytso@mit.edu> wrote: > Allow mballoc debugging to be enabled at run-time instead of just at > compile time. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > fs/ext4/Kconfig | 9 ++++++ > fs/ext4/mballoc.c | 81 ++++++++++++++++++++++++++++++++++++++-------------- > fs/ext4/mballoc.h | 16 ++++++++-- > 3 files changed, 80 insertions(+), 26 deletions(-) > > diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig > index 1523046..d5c0ea2 100644 > --- a/fs/ext4/Kconfig > +++ b/fs/ext4/Kconfig > @@ -77,3 +77,12 @@ config EXT4_FS_SECURITY > > If you are not using a security module that requires using > extended attributes for file security labels, say N. > + > +config EXT4_DEBUG > + bool "EXT4 debugging support" > + depends on EXT4_FS > + help > + Enables run-time debugging support for the ext4 filesystem. > + > + If you select Y here, then you will be able to turn on debugging > + with a command such as "echo 1 > /sys/kernel/debug/ext4/mballoc-debug" > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 68cde59..2c81240 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -22,6 +22,7 @@ > */ > > #include "mballoc.h" > +#include <linux/debugfs.h> > #include <trace/events/ext4.h> > > /* > @@ -743,7 +744,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) > char *data; > char *bitmap; > > - mb_debug("init page %lu\n", page->index); > + mb_debug(1, "init page %lu\n", page->index); > > inode = page->mapping->host; > sb = inode->i_sb; > @@ -822,7 +823,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) > set_bitmap_uptodate(bh[i]); > bh[i]->b_end_io = end_buffer_read_sync; > submit_bh(READ, bh[i]); > - mb_debug("read bitmap for group %u\n", first_group + i); > + mb_debug(1, "read bitmap for group %u\n", first_group + i); > } > > /* wait for I/O completion */ > @@ -862,7 +863,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) > if ((first_block + i) & 1) { > /* this is block of buddy */ > BUG_ON(incore == NULL); > - mb_debug("put buddy for group %u in page %lu/%x\n", > + mb_debug(1, "put buddy for group %u in page %lu/%x\n", > group, page->index, i * blocksize); > grinfo = ext4_get_group_info(sb, group); > grinfo->bb_fragments = 0; > @@ -878,7 +879,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore) > } else { > /* this is block of bitmap */ > BUG_ON(incore != NULL); > - mb_debug("put bitmap for group %u in page %lu/%x\n", > + mb_debug(1, "put bitmap for group %u in page %lu/%x\n", > group, page->index, i * blocksize); > > /* see comments in ext4_mb_put_pa() */ > @@ -922,7 +923,7 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, > struct ext4_sb_info *sbi = EXT4_SB(sb); > struct inode *inode = sbi->s_buddy_cache; > > - mb_debug("load group %u\n", group); > + mb_debug(1, "load group %u\n", group); > > blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize; > grp = ext4_get_group_info(sb, group); > @@ -1851,7 +1852,7 @@ int ext4_mb_init_group(struct super_block *sb, ext4_group_t group) > struct inode *inode = sbi->s_buddy_cache; > struct page *page = NULL, *bitmap_page = NULL; > > - mb_debug("init group %u\n", group); > + mb_debug(1, "init group %u\n", group); > blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize; > this_grp = ext4_get_group_info(sb, group); > /* > @@ -2739,7 +2740,7 @@ static void ext4_mb_cleanup_pa(struct ext4_group_info *grp) > kmem_cache_free(ext4_pspace_cachep, pa); > } > if (count) > - mb_debug("mballoc: %u PAs left\n", count); > + mb_debug(1, "mballoc: %u PAs left\n", count); > > } > > @@ -2820,7 +2821,7 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn) > list_for_each_safe(l, ltmp, &txn->t_private_list) { > entry = list_entry(l, struct ext4_free_data, list); > > - mb_debug("gonna free %u blocks in group %u (0x%p):", > + mb_debug(1, "gonna free %u blocks in group %u (0x%p):", > entry->count, entry->group, entry); > > err = ext4_mb_load_buddy(sb, entry->group, &e4b); > @@ -2855,9 +2856,43 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn) > ext4_mb_release_desc(&e4b); > } > > - mb_debug("freed %u blocks in %u structures\n", count, count2); > + mb_debug(1, "freed %u blocks in %u structures\n", count, count2); > } > > +#ifdef CONFIG_EXT4_DEBUG > +u8 mb_enable_debug __read_mostly; > + > +static struct dentry *debugfs_dir; > +static struct dentry *debugfs_debug; > + > +static void __init ext4_create_debugfs_entry(void) > +{ > + debugfs_dir = debugfs_create_dir("ext4", NULL); > + if (debugfs_dir) > + debugfs_debug = debugfs_create_u8("mballoc-debug", > + S_IRUGO | S_IWUSR, > + debugfs_dir, > + &mb_enable_debug); > +} > + > +static void __exit ext4_remove_debugfs_entry(void) > +{ > + debugfs_remove(debugfs_debug); > + debugfs_remove(debugfs_dir); > +} > + > +#else > + > +static void __init ext4_create_debugfs_entry(void) > +{ > +} > + > +static void __exit ext4_remove_debugfs_entry(void) > +{ > +} > + > +#endif > + > int __init init_ext4_mballoc(void) > { > ext4_pspace_cachep = > @@ -2885,6 +2920,7 @@ int __init init_ext4_mballoc(void) > kmem_cache_destroy(ext4_ac_cachep); > return -ENOMEM; > } > + ext4_create_debugfs_entry(); > return 0; > } > > @@ -2898,6 +2934,7 @@ void exit_ext4_mballoc(void) > kmem_cache_destroy(ext4_pspace_cachep); > kmem_cache_destroy(ext4_ac_cachep); > kmem_cache_destroy(ext4_free_ext_cachep); > + ext4_remove_debugfs_entry(); > } > > > @@ -3042,7 +3079,7 @@ static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac) > ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe; > else > ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc; > - mb_debug("#%u: goal %u blocks for locality group\n", > + mb_debug(1, "#%u: goal %u blocks for locality group\n", > current->pid, ac->ac_g_ex.fe_len); > } > > @@ -3232,7 +3269,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, > ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL; > } > > - mb_debug("goal: %u(was %u) blocks at %u\n", (unsigned) size, > + mb_debug(1, "goal: %u(was %u) blocks at %u\n", (unsigned) size, > (unsigned) orig_size, (unsigned) start); > } > > @@ -3281,7 +3318,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac, > BUG_ON(pa->pa_free < len); > pa->pa_free -= len; > > - mb_debug("use %llu/%u from inode pa %p\n", start, len, pa); > + mb_debug(1, "use %llu/%u from inode pa %p\n", start, len, pa); > } > > /* > @@ -3305,7 +3342,7 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac, > * in on-disk bitmap -- see ext4_mb_release_context() > * Other CPUs are prevented from allocating from this pa by lg_mutex > */ > - mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa); > + mb_debug(1, "use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa); > } > > /* > @@ -3484,7 +3521,7 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, > preallocated += len; > count++; > } > - mb_debug("prellocated %u for group %u\n", preallocated, group); > + mb_debug(1, "prellocated %u for group %u\n", preallocated, group); > } > > static void ext4_mb_pa_callback(struct rcu_head *head) > @@ -3619,7 +3656,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) > pa->pa_deleted = 0; > pa->pa_type = MB_INODE_PA; > > - mb_debug("new inode pa %p: %llu/%u for %u\n", pa, > + mb_debug(1, "new inode pa %p: %llu/%u for %u\n", pa, > pa->pa_pstart, pa->pa_len, pa->pa_lstart); > trace_ext4_mb_new_inode_pa(ac, pa); > > @@ -3679,7 +3716,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac) > pa->pa_deleted = 0; > pa->pa_type = MB_GROUP_PA; > > - mb_debug("new group pa %p: %llu/%u for %u\n", pa, > + mb_debug(1, "new group pa %p: %llu/%u for %u\n", pa, > pa->pa_pstart, pa->pa_len, pa->pa_lstart); > trace_ext4_mb_new_group_pa(ac, pa); > > @@ -3758,7 +3795,7 @@ ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh, > next = mb_find_next_bit(bitmap_bh->b_data, end, bit); > start = group * EXT4_BLOCKS_PER_GROUP(sb) + bit + > le32_to_cpu(sbi->s_es->s_first_data_block); > - mb_debug(" free preallocated %u/%u in group %u\n", > + mb_debug(1, " free preallocated %u/%u in group %u\n", > (unsigned) start, (unsigned) next - bit, > (unsigned) group); > free += next - bit; > @@ -3849,7 +3886,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, > int busy = 0; > int free = 0; > > - mb_debug("discard preallocation for group %u\n", group); > + mb_debug(1, "discard preallocation for group %u\n", group); > > if (list_empty(&grp->bb_prealloc_list)) > return 0; > @@ -3973,7 +4010,7 @@ void ext4_discard_preallocations(struct inode *inode) > return; > } > > - mb_debug("discard preallocation for inode %lu\n", inode->i_ino); > + mb_debug(1, "discard preallocation for inode %lu\n", inode->i_ino); > trace_ext4_discard_preallocations(inode); > > INIT_LIST_HEAD(&list); > @@ -4078,7 +4115,7 @@ static void ext4_mb_return_to_preallocation(struct inode *inode, > { > BUG_ON(!list_empty(&EXT4_I(inode)->i_prealloc_list)); > } > -#ifdef MB_DEBUG > +#ifdef CONFIG_EXT4_DEBUG > static void ext4_mb_show_ac(struct ext4_allocation_context *ac) > { > struct super_block *sb = ac->ac_sb; > @@ -4227,7 +4264,7 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac, > * locality group. this is a policy, actually */ > ext4_mb_group_or_file(ac); > > - mb_debug("init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, " > + mb_debug(1, "init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, " > "left: %u/%u, right %u/%u to %swritable\n", > (unsigned) ar->len, (unsigned) ar->logical, > (unsigned) ar->goal, ac->ac_flags, ac->ac_2order, > @@ -4249,7 +4286,7 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb, > struct ext4_prealloc_space *pa, *tmp; > struct ext4_allocation_context *ac; > > - mb_debug("discard locality group preallocation\n"); > + mb_debug(1, "discard locality group preallocation\n"); > > INIT_LIST_HEAD(&discard_list); > ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS); > diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h > index c96bb19..28abb02 100644 > --- a/fs/ext4/mballoc.h > +++ b/fs/ext4/mballoc.h > @@ -37,11 +37,19 @@ > > /* > */ > -#define MB_DEBUG__ > -#ifdef MB_DEBUG > -#define mb_debug(fmt, a...) printk(fmt, ##a) > +#ifdef CONFIG_EXT4_DEBUG > +extern u8 mb_enable_debug; > + > +#define mb_debug(n, fmt, a...) \ > + do { \ > + if ((n) <= mb_enable_debug) { \ > + printk (KERN_DEBUG "(%s, %d): %s: ", \ > + __FILE__, __LINE__, __func__); \ > + printk (fmt, ## a); \ > + } \ > + } while (0) > #else > -#define mb_debug(fmt, a...) > +#define mb_debug(n, fmt, a...) > #endif > > /* > -- > 1.6.3.2.1.gb9f7d.dirty > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging 2009-08-11 18:15 ` Xiang Wang @ 2009-08-11 18:53 ` Theodore Tso 0 siblings, 0 replies; 15+ messages in thread From: Theodore Tso @ 2009-08-11 18:53 UTC (permalink / raw) To: Xiang Wang; +Cc: linux-ext4, Andreas Dilger, Alex Tomas, Curt Wohlgemuth On Tue, Aug 11, 2009 at 11:15:53AM -0700, Xiang Wang wrote: > Hi Ted, > > I tried to apply this patch to our kernel, but I encountered some > problems in building the kernel: > > FATAL: fs/ext4/ext4.o(.text+0x24f2d): Section mismatch in reference > from the function exit_ext4_mballoc() to the function > .exit.text:ext4_remove_debugfs_entry() > > Looking at the code, the problem seems to be that, exit_ext4_mballoc > calls ext4_remove_debugfs_entry. And ext4_remove_debugfs_entry has the > "__exit" annotation while exit_ext4_mballoc does not. I tried removing > the "__exit" annotation from ext4_remove_debugfs_entry and it builds > well. Good catch; I didn't notice because I wasn't compiling with CONFIG_DEBUG_SECTION_MISMATCH=y. Your fix is the right one. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal 2009-08-10 3:23 [PATCH, RFC -V2 0/4] mballoc patches for ext4 Theodore Ts'o 2009-08-10 3:23 ` [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging Theodore Ts'o @ 2009-08-10 3:23 ` Theodore Ts'o 2009-08-10 3:44 ` Eric Sandeen 2009-08-10 3:23 ` [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode Theodore Ts'o 2009-08-10 3:23 ` [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files Theodore Ts'o 3 siblings, 1 reply; 15+ messages in thread From: Theodore Ts'o @ 2009-08-10 3:23 UTC (permalink / raw) To: linux-ext4; +Cc: Andreas Dilger, Alex Tomas, Theodore Ts'o Displaying the flags in base 16 makes it easier to see which flags have been set. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- fs/ext4/ext4.h | 22 +++++++++++----------- fs/ext4/mballoc.c | 4 ++-- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9714db3..e267727 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -67,27 +67,27 @@ typedef unsigned int ext4_group_t; /* prefer goal again. length */ -#define EXT4_MB_HINT_MERGE 1 +#define EXT4_MB_HINT_MERGE 0x0001 /* blocks already reserved */ -#define EXT4_MB_HINT_RESERVED 2 +#define EXT4_MB_HINT_RESERVED 0x0002 /* metadata is being allocated */ -#define EXT4_MB_HINT_METADATA 4 +#define EXT4_MB_HINT_METADATA 0x0004 /* first blocks in the file */ -#define EXT4_MB_HINT_FIRST 8 +#define EXT4_MB_HINT_FIRST 0x0008 /* search for the best chunk */ -#define EXT4_MB_HINT_BEST 16 +#define EXT4_MB_HINT_BEST 0x0010 /* data is being allocated */ -#define EXT4_MB_HINT_DATA 32 +#define EXT4_MB_HINT_DATA 0x0020 /* don't preallocate (for tails) */ -#define EXT4_MB_HINT_NOPREALLOC 64 +#define EXT4_MB_HINT_NOPREALLOC 0x0040 /* allocate for locality group */ -#define EXT4_MB_HINT_GROUP_ALLOC 128 +#define EXT4_MB_HINT_GROUP_ALLOC 0x0080 /* allocate goal blocks or none */ -#define EXT4_MB_HINT_GOAL_ONLY 256 +#define EXT4_MB_HINT_GOAL_ONLY 0x0100 /* goal is meaningful */ -#define EXT4_MB_HINT_TRY_GOAL 512 +#define EXT4_MB_HINT_TRY_GOAL 0x0200 /* blocks already pre-reserved by delayed allocation */ -#define EXT4_MB_DELALLOC_RESERVED 1024 +#define EXT4_MB_DELALLOC_RESERVED 0x0400 struct ext4_allocation_request { diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 2c81240..f510a58 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2157,7 +2157,7 @@ static int ext4_mb_seq_history_show(struct seq_file *seq, void *v) if (v == SEQ_START_TOKEN) { seq_printf(seq, "%-5s %-8s %-23s %-23s %-23s %-5s " - "%-5s %-2s %-5s %-5s %-5s %-6s\n", + "%-5s %-2s %-6s %-5s %-5s %-6s\n", "pid", "inode", "original", "goal", "result", "found", "grps", "cr", "flags", "merge", "tail", "broken"); return 0; @@ -2165,7 +2165,7 @@ static int ext4_mb_seq_history_show(struct seq_file *seq, void *v) if (hs->op == EXT4_MB_HISTORY_ALLOC) { fmt = "%-5u %-8u %-23s %-23s %-23s %-5u %-5u %-2u " - "%-5u %-5s %-5u %-6u\n"; + "0x%04x %-5s %-5u %-6u\n"; sprintf(buf2, "%u/%d/%u@%u", hs->result.fe_group, hs->result.fe_start, hs->result.fe_len, hs->result.fe_logical); -- 1.6.3.2.1.gb9f7d.dirty ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal 2009-08-10 3:23 ` [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal Theodore Ts'o @ 2009-08-10 3:44 ` Eric Sandeen 0 siblings, 0 replies; 15+ messages in thread From: Eric Sandeen @ 2009-08-10 3:44 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4, Andreas Dilger, Alex Tomas Theodore Ts'o wrote: > Displaying the flags in base 16 makes it easier to see which flags > have been set. Looks good to me, Reviewed-by: Eric Sandeen <sandeen@redhat.com> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > fs/ext4/ext4.h | 22 +++++++++++----------- > fs/ext4/mballoc.c | 4 ++-- > 2 files changed, 13 insertions(+), 13 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 9714db3..e267727 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -67,27 +67,27 @@ typedef unsigned int ext4_group_t; > > > /* prefer goal again. length */ > -#define EXT4_MB_HINT_MERGE 1 > +#define EXT4_MB_HINT_MERGE 0x0001 > /* blocks already reserved */ > -#define EXT4_MB_HINT_RESERVED 2 > +#define EXT4_MB_HINT_RESERVED 0x0002 > /* metadata is being allocated */ > -#define EXT4_MB_HINT_METADATA 4 > +#define EXT4_MB_HINT_METADATA 0x0004 > /* first blocks in the file */ > -#define EXT4_MB_HINT_FIRST 8 > +#define EXT4_MB_HINT_FIRST 0x0008 > /* search for the best chunk */ > -#define EXT4_MB_HINT_BEST 16 > +#define EXT4_MB_HINT_BEST 0x0010 > /* data is being allocated */ > -#define EXT4_MB_HINT_DATA 32 > +#define EXT4_MB_HINT_DATA 0x0020 > /* don't preallocate (for tails) */ > -#define EXT4_MB_HINT_NOPREALLOC 64 > +#define EXT4_MB_HINT_NOPREALLOC 0x0040 > /* allocate for locality group */ > -#define EXT4_MB_HINT_GROUP_ALLOC 128 > +#define EXT4_MB_HINT_GROUP_ALLOC 0x0080 > /* allocate goal blocks or none */ > -#define EXT4_MB_HINT_GOAL_ONLY 256 > +#define EXT4_MB_HINT_GOAL_ONLY 0x0100 > /* goal is meaningful */ > -#define EXT4_MB_HINT_TRY_GOAL 512 > +#define EXT4_MB_HINT_TRY_GOAL 0x0200 > /* blocks already pre-reserved by delayed allocation */ > -#define EXT4_MB_DELALLOC_RESERVED 1024 > +#define EXT4_MB_DELALLOC_RESERVED 0x0400 > > > struct ext4_allocation_request { > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 2c81240..f510a58 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -2157,7 +2157,7 @@ static int ext4_mb_seq_history_show(struct seq_file *seq, void *v) > > if (v == SEQ_START_TOKEN) { > seq_printf(seq, "%-5s %-8s %-23s %-23s %-23s %-5s " > - "%-5s %-2s %-5s %-5s %-5s %-6s\n", > + "%-5s %-2s %-6s %-5s %-5s %-6s\n", > "pid", "inode", "original", "goal", "result", "found", > "grps", "cr", "flags", "merge", "tail", "broken"); > return 0; > @@ -2165,7 +2165,7 @@ static int ext4_mb_seq_history_show(struct seq_file *seq, void *v) > > if (hs->op == EXT4_MB_HISTORY_ALLOC) { > fmt = "%-5u %-8u %-23s %-23s %-23s %-5u %-5u %-2u " > - "%-5u %-5s %-5u %-6u\n"; > + "0x%04x %-5s %-5u %-6u\n"; > sprintf(buf2, "%u/%d/%u@%u", hs->result.fe_group, > hs->result.fe_start, hs->result.fe_len, > hs->result.fe_logical); ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode 2009-08-10 3:23 [PATCH, RFC -V2 0/4] mballoc patches for ext4 Theodore Ts'o 2009-08-10 3:23 ` [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging Theodore Ts'o 2009-08-10 3:23 ` [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal Theodore Ts'o @ 2009-08-10 3:23 ` Theodore Ts'o 2009-08-20 7:22 ` Aneesh Kumar K.V 2009-08-10 3:23 ` [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files Theodore Ts'o 3 siblings, 1 reply; 15+ messages in thread From: Theodore Ts'o @ 2009-08-10 3:23 UTC (permalink / raw) To: linux-ext4; +Cc: Andreas Dilger, Alex Tomas, Theodore Ts'o The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all screwed up. These fields were getting unconditionally all the time, set even when stream allocation had not taken place, and if they were being used when the file was smaller than s_mb_stream_request, which is when the allocation should _not_ be doing stream allocation. Fix this by determining whether or not we stream allocation should take place once, in ext4_mb_group_or_file(), and setting a flag which gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found(). This simplifies the code and assures that we are consistently using (or not using) the stream allocation logic. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- fs/ext4/ext4.h | 2 ++ fs/ext4/mballoc.c | 23 ++++++++++------------- 2 files changed, 12 insertions(+), 13 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index e267727..70aa951 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -88,6 +88,8 @@ typedef unsigned int ext4_group_t; #define EXT4_MB_HINT_TRY_GOAL 0x0200 /* blocks already pre-reserved by delayed allocation */ #define EXT4_MB_DELALLOC_RESERVED 0x0400 +/* We are doing stream allocation */ +#define EXT4_MB_STREAM_ALLOC 0x0800 struct ext4_allocation_request { diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index f510a58..a103cb0 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1361,7 +1361,7 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac, ac->alloc_semp = e4b->alloc_semp; e4b->alloc_semp = NULL; /* store last allocated for subsequent stream allocation */ - if ((ac->ac_flags & EXT4_MB_HINT_DATA)) { + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) { spin_lock(&sbi->s_md_lock); sbi->s_mb_last_group = ac->ac_f_ex.fe_group; sbi->s_mb_last_start = ac->ac_f_ex.fe_start; @@ -1939,7 +1939,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) struct ext4_sb_info *sbi; struct super_block *sb; struct ext4_buddy e4b; - loff_t size, isize; sb = ac->ac_sb; sbi = EXT4_SB(sb); @@ -1975,20 +1974,16 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) } bsbits = ac->ac_sb->s_blocksize_bits; - /* if stream allocation is enabled, use global goal */ - size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; - isize = i_size_read(ac->ac_inode) >> bsbits; - if (size < isize) - size = isize; - if (size < sbi->s_mb_stream_request && - (ac->ac_flags & EXT4_MB_HINT_DATA)) { + /* if stream allocation is enabled, use global goal */ + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) { /* TBD: may be hot point */ spin_lock(&sbi->s_md_lock); ac->ac_g_ex.fe_group = sbi->s_mb_last_group; ac->ac_g_ex.fe_start = sbi->s_mb_last_start; spin_unlock(&sbi->s_md_lock); } + /* Let's just scan groups to find more-less suitable blocks */ cr = ac->ac_2order ? 0 : 1; /* @@ -4192,16 +4187,18 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) if (!(ac->ac_flags & EXT4_MB_HINT_DATA)) return; + if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY)) + return; + size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; isize = i_size_read(ac->ac_inode) >> bsbits; size = max(size, isize); /* don't use group allocation for large files */ - if (size >= sbi->s_mb_stream_request) - return; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode 2009-08-10 3:23 ` [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode Theodore Ts'o @ 2009-08-20 7:22 ` Aneesh Kumar K.V 2009-08-20 18:20 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Aneesh Kumar K.V @ 2009-08-20 7:22 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4, Andreas Dilger, Alex Tomas On Sun, Aug 09, 2009 at 11:23:54PM -0400, Theodore Ts'o wrote: > The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all > screwed up. These fields were getting unconditionally all the time, > set even when stream allocation had not taken place, and if they were > being used when the file was smaller than s_mb_stream_request, which > is when the allocation should _not_ be doing stream allocation. > > Fix this by determining whether or not we stream allocation should > take place once, in ext4_mb_group_or_file(), and setting a flag which > gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found(). > This simplifies the code and assures that we are consistently using > (or not using) the stream allocation logic. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > fs/ext4/ext4.h | 2 ++ > fs/ext4/mballoc.c | 23 ++++++++++------------- > 2 files changed, 12 insertions(+), 13 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index e267727..70aa951 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -88,6 +88,8 @@ typedef unsigned int ext4_group_t; > #define EXT4_MB_HINT_TRY_GOAL 0x0200 > /* blocks already pre-reserved by delayed allocation */ > #define EXT4_MB_DELALLOC_RESERVED 0x0400 > +/* We are doing stream allocation */ > +#define EXT4_MB_STREAM_ALLOC 0x0800 > > > struct ext4_allocation_request { > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index f510a58..a103cb0 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -1361,7 +1361,7 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac, > ac->alloc_semp = e4b->alloc_semp; > e4b->alloc_semp = NULL; > /* store last allocated for subsequent stream allocation */ > - if ((ac->ac_flags & EXT4_MB_HINT_DATA)) { > + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) { > spin_lock(&sbi->s_md_lock); > sbi->s_mb_last_group = ac->ac_f_ex.fe_group; > sbi->s_mb_last_start = ac->ac_f_ex.fe_start; > @@ -1939,7 +1939,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) > struct ext4_sb_info *sbi; > struct super_block *sb; > struct ext4_buddy e4b; > - loff_t size, isize; > > sb = ac->ac_sb; > sbi = EXT4_SB(sb); > @@ -1975,20 +1974,16 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) > } > > bsbits = ac->ac_sb->s_blocksize_bits; > - /* if stream allocation is enabled, use global goal */ > - size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; > - isize = i_size_read(ac->ac_inode) >> bsbits; > - if (size < isize) > - size = isize; > > - if (size < sbi->s_mb_stream_request && > - (ac->ac_flags & EXT4_MB_HINT_DATA)) { > + /* if stream allocation is enabled, use global goal */ > + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) { > /* TBD: may be hot point */ > spin_lock(&sbi->s_md_lock); > ac->ac_g_ex.fe_group = sbi->s_mb_last_group; > ac->ac_g_ex.fe_start = sbi->s_mb_last_start; > spin_unlock(&sbi->s_md_lock); > } > + > /* Let's just scan groups to find more-less suitable blocks */ > cr = ac->ac_2order ? 0 : 1; > /* > @@ -4192,16 +4187,18 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) > if (!(ac->ac_flags & EXT4_MB_HINT_DATA)) > return; > > + if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY)) > + return; > + > size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; > isize = i_size_read(ac->ac_inode) >> bsbits; > size = max(size, isize); > > /* don't use group allocation for large files */ > - if (size >= sbi->s_mb_stream_request) > - return; > - > - if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY)) > + if (size >= sbi->s_mb_stream_request) { > + ac->ac_flags |= EXT4_MB_STREAM_ALLOC; > return; > + } > > BUG_ON(ac->ac_lg != NULL); > /* NAK. This would give bad allocation pattern for large files. We should be using global goal only for small files not for large files. Large files should be using neighbour allocated extent block as the goal, so that we get contiguous blocks. -aneesh ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode 2009-08-20 7:22 ` Aneesh Kumar K.V @ 2009-08-20 18:20 ` Theodore Tso 0 siblings, 0 replies; 15+ messages in thread From: Theodore Tso @ 2009-08-20 18:20 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: linux-ext4, Andreas Dilger, Alex Tomas On Thu, Aug 20, 2009 at 12:52:38PM +0530, Aneesh Kumar K.V wrote: > > This would give bad allocation pattern for large files. We should be > using global goal only for small files not for large files. Huh? Small files should be allocated within their flex_bg close to their parent directories, right? Large files are supposed to allocated globally, potentially outside of the flex_bg so they won't chew up all of the space in the local flex_bg. Also, the comments ext4.h for s_mb_last_group and s_mb_last_start indicate: "where last allocation was done - for stream allocation". If your interpretation was correct, the comment would be wrong. > Large files should be using neighbour allocated extent block as the > goal, so that we get contiguous blocks. We do use the neighbour allocated extent block as the goal. The code in question here is used only when the ext4_mb_find_by_goal() has failed. This brings up the larger problem which is the mballoc code is extremely hard to understand, and not sufficiently documented, where the algorithm is broken up so many pieces that unless you spend a long time mind-melding with the code, it's sometimes very hard to get a mental map of the forest versus the trees. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files 2009-08-10 3:23 [PATCH, RFC -V2 0/4] mballoc patches for ext4 Theodore Ts'o ` (2 preceding siblings ...) 2009-08-10 3:23 ` [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode Theodore Ts'o @ 2009-08-10 3:23 ` Theodore Ts'o 2009-08-20 6:40 ` Aneesh Kumar K.V 3 siblings, 1 reply; 15+ messages in thread From: Theodore Ts'o @ 2009-08-10 3:23 UTC (permalink / raw) To: linux-ext4; +Cc: Andreas Dilger, Alex Tomas, Theodore Ts'o Currently the group preallocation code tries to find a large (512) free block from which to do per-cpu group allocation for small files. The problem with this scheme is that it leaves the filesystem horribly fragmented. In the worst case, if the filesystem is unmounted and remounted (after a system shutdown, for example) we forget the fact that wee were using a particular (now-partially filled) 512 block extent. So the next time we try to allocate space for a small file, we will find *another* completely free 512 block chunk to allocate small files. Given that there are 32,768 blocks in a block group, after 64 iterations of "mount, write one 4k file in a directory, unmount", the block group will have 64 files, each separated by 511 blocks, and the block group will no longer have any free 512 completely free chunks of blocks for group preallocation space. So if we try to allocate blocks for a file that has been closed, such that we know the final size of the file, and the filesystem is not busy, avoid using group preallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- fs/ext4/ext4.h | 22 +++++++++++++++++++++- fs/ext4/mballoc.c | 10 +++++++++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 70aa951..a09ea10 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -952,6 +952,7 @@ struct ext4_sb_info { atomic_t s_mb_lost_chunks; atomic_t s_mb_preallocated; atomic_t s_mb_discarded; + atomic_t s_lock_busy; /* locality groups */ struct ext4_locality_group *s_locality_groups; @@ -1593,15 +1594,34 @@ struct ext4_group_info { #define EXT4_MB_GRP_NEED_INIT(grp) \ (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state))) +#define EXT4_MAX_CONTENTION 8 +#define EXT4_CONTENTION_THRESHOLD 2 + static inline spinlock_t *ext4_group_lock_ptr(struct super_block *sb, ext4_group_t group) { return bgl_lock_ptr(EXT4_SB(sb)->s_blockgroup_lock, group); } +/* + * Returns true if the filesystem is busy enough that attempts to + * access the block group locks has run into contention. + */ +static inline int ext4_fs_is_busy(struct ext4_sb_info *sbi) +{ + return (atomic_read(&sbi->s_lock_busy) > EXT4_CONTENTION_THRESHOLD); +} + static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group) { - spin_lock(ext4_group_lock_ptr(sb, group)); + spinlock_t *lock = ext4_group_lock_ptr(sb, group); + if (spin_trylock(lock)) + atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, 1, + EXT4_MAX_CONTENTION); + else { + atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, -1, 0); + spin_lock(lock); + } } static inline void ext4_unlock_group(struct super_block *sb, diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a103cb0..6c7be0d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4191,9 +4191,17 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) return; size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; - isize = i_size_read(ac->ac_inode) >> bsbits; + isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1) + >> bsbits; size = max(size, isize); + if ((size == isize) && + ext4_fs_is_busy(sbi) && + (atomic_read(&ac->ac_inode->i_writecount) == 0)) { + ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC; + return; + } + /* don't use group allocation for large files */ if (size >= sbi->s_mb_stream_request) { ac->ac_flags |= EXT4_MB_STREAM_ALLOC; -- 1.6.3.2.1.gb9f7d.dirty ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files 2009-08-10 3:23 ` [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files Theodore Ts'o @ 2009-08-20 6:40 ` Aneesh Kumar K.V 2009-08-21 2:45 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Aneesh Kumar K.V @ 2009-08-20 6:40 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4, Andreas Dilger, Alex Tomas On Sun, Aug 09, 2009 at 11:23:55PM -0400, Theodore Ts'o wrote: .... > > static inline void ext4_unlock_group(struct super_block *sb, > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index a103cb0..6c7be0d 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -4191,9 +4191,17 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) > return; > > size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; > - isize = i_size_read(ac->ac_inode) >> bsbits; > + isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1) > + >> bsbits; > size = max(size, isize); > > + if ((size == isize) && What is this check supposed to help us ?. This would also imply we disable prealloc only if we are allocating the last chunk in the file. Why not just if (atomic_read(&ac->ac_inode->i_writecount) == 0) && !ext4_fs_is_busy(sbi) { ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC; } > + ext4_fs_is_busy(sbi) && > + (atomic_read(&ac->ac_inode->i_writecount) == 0)) { > + ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC; > + return; > + } shouldn't it be !ext4_fs_is_busy(sbi) ?. Can you also write function documentation for ext4_fs_is_busy. I found in confusing that you are decrementing s_lock_busy if we are going to spin on spin_lock. > + > /* don't use group allocation for large files */ > if (size >= sbi->s_mb_stream_request) { > ac->ac_flags |= EXT4_MB_STREAM_ALLOC; > -- -aneesh ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files 2009-08-20 6:40 ` Aneesh Kumar K.V @ 2009-08-21 2:45 ` Theodore Tso 2009-08-21 12:23 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Theodore Tso @ 2009-08-21 2:45 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: linux-ext4, Andreas Dilger, Alex Tomas On Thu, Aug 20, 2009 at 12:10:35PM +0530, Aneesh Kumar K.V wrote: > > + if ((size == isize) && > > What is this check supposed to help us ?. This would also imply we > disable prealloc only if we are allocating the last chunk in the > file. That was the idea, yes; the idea was to disable preallocation if the file is small enough that it could be written in a single call to ext4_da_writepages, or if we are allocating/writing the last chunk in a file. Otherwise, preallocation would be a good thing. > shouldn't it be !ext4_fs_is_busy(sbi) ?. Can you also write function > documentation for ext4_fs_is_busy. I found in confusing that you are > decrementing s_lock_busy if we are going to spin on spin_lock. Um, oops. Yeah, good point. It should be !ext4_fs_is_busy(). I also have the logic backwards in ext4_lock_group as well, so the two errors mostly cancel each other out. I'll fix that in the patch. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files 2009-08-21 2:45 ` Theodore Tso @ 2009-08-21 12:23 ` Theodore Tso 0 siblings, 0 replies; 15+ messages in thread From: Theodore Tso @ 2009-08-21 12:23 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: linux-ext4, Andreas Dilger, Alex Tomas On Thu, Aug 20, 2009 at 10:45:45PM -0400, Theodore Tso wrote: > Um, oops. Yeah, good point. It should be !ext4_fs_is_busy(). I also > have the logic backwards in ext4_lock_group as well, so the two errors > mostly cancel each other out. I'll fix that in the patch. ... and here's the revised patch. - Ted ext4: Avoid group preallocation for closed files Currently the group preallocation code tries to find a large (512) free block from which to do per-cpu group allocation for small files. The problem with this scheme is that it leaves the filesystem horribly fragmented. In the worst case, if the filesystem is unmounted and remounted (after a system shutdown, for example) we forget the fact that wee were using a particular (now-partially filled) 512 block extent. So the next time we try to allocate space for a small file, we will find *another* completely free 512 block chunk to allocate small files. Given that there are 32,768 blocks in a block group, after 64 iterations of "mount, write one 4k file in a directory, unmount", the block group will have 64 files, each separated by 511 blocks, and the block group will no longer have any free 512 completely free chunks of blocks for group preallocation space. So if we try to allocate blocks for a file that has been closed, such that we know the final size of the file, and the filesystem is not busy, avoid using group preallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 70aa951..b989920 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -952,6 +952,7 @@ struct ext4_sb_info { atomic_t s_mb_lost_chunks; atomic_t s_mb_preallocated; atomic_t s_mb_discarded; + atomic_t s_lock_busy; /* locality groups */ struct ext4_locality_group *s_locality_groups; @@ -1593,15 +1594,42 @@ struct ext4_group_info { #define EXT4_MB_GRP_NEED_INIT(grp) \ (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state))) +#define EXT4_MAX_CONTENTION 8 +#define EXT4_CONTENTION_THRESHOLD 2 + static inline spinlock_t *ext4_group_lock_ptr(struct super_block *sb, ext4_group_t group) { return bgl_lock_ptr(EXT4_SB(sb)->s_blockgroup_lock, group); } +/* + * Returns true if the filesystem is busy enough that attempts to + * access the block group locks has run into contention. + */ +static inline int ext4_fs_is_busy(struct ext4_sb_info *sbi) +{ + return (atomic_read(&sbi->s_lock_busy) > EXT4_CONTENTION_THRESHOLD); +} + static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group) { - spin_lock(ext4_group_lock_ptr(sb, group)); + spinlock_t *lock = ext4_group_lock_ptr(sb, group); + if (spin_trylock(lock)) + /* + * We're able to grab the lock right away, so drop the + * lock contention counter. + */ + atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, -1, 0); + else { + /* + * The lock is busy, so bump the contention counter, + * and then wait on the spin lock. + */ + atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, 1, + EXT4_MAX_CONTENTION); + spin_lock(lock); + } } static inline void ext4_unlock_group(struct super_block *sb, diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a103cb0..ee563e2 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4191,9 +4191,17 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) return; size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len; - isize = i_size_read(ac->ac_inode) >> bsbits; + isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1) + >> bsbits; size = max(size, isize); + if ((size == isize) && + !ext4_fs_is_busy(sbi) && + (atomic_read(&ac->ac_inode->i_writecount) == 0)) { + ac->ac_flags |= EXT4_MB_HINT_NOPREALLOC; + return; + } + /* don't use group allocation for large files */ if (size >= sbi->s_mb_stream_request) { ac->ac_flags |= EXT4_MB_STREAM_ALLOC; ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-08-21 12:23 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-10 3:23 [PATCH, RFC -V2 0/4] mballoc patches for ext4 Theodore Ts'o 2009-08-10 3:23 ` [PATCH, RFC -V2 1/4] ext4: Add configurable run-time mballoc debugging Theodore Ts'o 2009-08-10 3:42 ` Eric Sandeen 2009-08-10 20:23 ` Theodore Tso 2009-08-11 18:15 ` Xiang Wang 2009-08-11 18:53 ` Theodore Tso 2009-08-10 3:23 ` [PATCH, RFC -V2 2/4] ext4: Display the mballoc flags in mb_history in hex instead of decimal Theodore Ts'o 2009-08-10 3:44 ` Eric Sandeen 2009-08-10 3:23 ` [PATCH, RFC -V2 3/4] ext4: Fix bugs in mballoc's stream allocation mode Theodore Ts'o 2009-08-20 7:22 ` Aneesh Kumar K.V 2009-08-20 18:20 ` Theodore Tso 2009-08-10 3:23 ` [PATCH, RFC -V2 4/4] ext4: Avoid group preallocation for closed files Theodore Ts'o 2009-08-20 6:40 ` Aneesh Kumar K.V 2009-08-21 2:45 ` Theodore Tso 2009-08-21 12:23 ` Theodore Tso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).