Re: [PATCH] New inode allocation for FLEX_BG meta-data groups.

Linux EXT4 FS development
 help / color / mirror / Atom feed

From: "Jose R. Santos" <jrs@us.ibm.com>
To: Andreas Dilger <adilger@sun.com>
Cc: linux-ext4 <linux-ext4@vger.kernel.org>, Mingming Cao <cmm@us.ibm.com>
Subject: Re: [PATCH] New inode allocation for FLEX_BG meta-data groups.
Date: Fri, 11 Jan 2008 16:44:17 -0600	[thread overview]
Message-ID: <20080111164417.328b7505@gara> (raw)
In-Reply-To: <20080111214658.GS3351@webber.adilger.int>

On Fri, 11 Jan 2008 14:46:58 -0700
Andreas Dilger <adilger@sun.com> wrote:

> On Jan 11, 2008  11:28 -0600, Jose R. Santos wrote:
> > @@ -127,6 +127,8 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
> >  		mark_bitmap_end(group_blocks, sb->s_blocksize * 8, bh->b_data);
> >  	}
> >  
> > +	if (sbi->s_log_groups_per_flex)
> > +		return free_blocks;
> >  	return free_blocks - sbi->s_itb_per_group - 2;
> 
> To be honest, I think this is a wart in ext4_init_block_bitmap() that
> it returns the number of free blocks in the group.  That value should
> really be set at mke2fs or e2fsck time, and if the last group is marked
> BLOCK_UNINIT it gets the free blocks count wrong because it always starts
> with EXT4_BLOCKS_PER_GROUP().
> 
> The above patch may also be incorrect since there may be inode tables or
> bitmaps in the above group even in the case of FLEX_BG filesystems.
> 
> > +#define free_block_ratio 10
> > +
> > +static int find_group_flex(struct super_block *sb, struct inode *parent, ext4_group_t *best_group)
> > +{
> > +	n_fbg_groups = (sbi->s_groups_count + flex_size - 1) / flex_size;
> 
> Can be a shift?

You're right.  This should be:

n_fbg_groups = (sbi->s_groups_count + flex_size - 1) >> sbi->s_log_groups_per_flex;

> I would suggest doing some kind of testing to see how well this allocation
> policy is working.  We don't want to force all allocations contiguously at
> the start of the filesystem, or we end up with FAT...

I've done several IO patterns with multiple threads and all of my test
are either same or faster performance than with the regular allocator.
Im hopping that moving this to the patch queue will expose the
allocator to more tests.
 
> > +static int ext4_fill_flex_info(struct super_block *sb)
> > +{
> > +	sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
> 
> Hmm, I guess no le*_to_cpu() because this is 8 bits?

Correct.

> > +
> > +	flex_group_count = (sbi->s_groups_count + groups_per_flex - 1) /
> > +		groups_per_flex;
> > +	sbi->s_flex_groups = kmalloc(flex_group_count *
> > +				     sizeof(struct flex_groups), GFP_KERNEL);
> > +	if (sbi->s_flex_groups == NULL) {
> > +		printk(KERN_ERR "EXT4-fs: not enough memory\n");
> 
> This should report "not enough memory for N flex groups" or something.

OK.
 
> > @@ -2105,6 +2154,13 @@ static int ext4_fill_super (struct super_block *sb,
> > +	if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
> > +		if (!ext4_fill_flex_info(sb)) {
> > +			printk(KERN_ERR
> > +			       "EXT4-fs: unable to initialize flex_bg meta info!\n");
> > +			goto failed_mount2;
> 
> Should this be considered a fatal error, or could sbi->s_log_groups_per_flex
> just be set to 0 and the filesystem be used as-is (maybe with sub-optimal
> allocations or something)?  Otherwise this renders the filesystem unusable.

I thought about doing that but using a sub-optimal allocator would
permanently screw up the ondisk data locality.  Maybe mounting the
filesystem read-only would be more appropriate.

> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 



-JRS

     prev parent reply	other threads:[~2008-01-11 22:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-11 17:28 [PATCH] New inode allocation for FLEX_BG meta-data groups Jose R. Santos
2008-01-11 21:46 ` Andreas Dilger
2008-01-11 22:44   ` Jose R. Santos [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080111164417.328b7505@gara \
    --to=jrs@us.ibm.com \
    --cc=adilger@sun.com \
    --cc=cmm@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox