public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <coyli@suse.de>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: dir inode reservation V3
Date: Wed, 21 Nov 2007 00:40:17 +0800	[thread overview]
Message-ID: <47430DF1.8090808@suse.de> (raw)
In-Reply-To: <20071120155826.GB31177@atrey.karlin.mff.cuni.cz>

Jan,

Thanks for taking time to review the patch :-)

Jan Kara wrote:
>   Hi Coly,
> 
>   finally I've found some time to have a look at a new version of your
> patch.
> 
>> 5, Performance number
>> On a Core-Duo, 2MB DDM memory, 7200 RPM SATA PC, I built a 50GB ext4
>> partition, and tried to create 50000 directories, and create 15 (1KB)
>> files in each directory alternatively. After a remount, I tried to
>> remove all the directories and files recursively by a 'rm -rf'. Bellow
>> is the benchmark result,
>> 			normal ext4			ext4 with dir inode reservation
>> 	mount options:	-o data=writeback		-o data=writeback,dir_ireserve=low
>> 	Create dirs:	real    0m49.101s		real    2m59.703s
>> 	Create files:	real    24m17.962s		real    21m8.161s
>> 	Unlink all:	real    24m43.788s		real    17m29.862s
>> Creating dirs with dir inode reservation is slower than normal ext4 as
>> predicted, because allocating directory inodes in non-linear order
>> will cause extra hard disk seeking and block I/O. Creating files with
>> dir inode reservation is 13% faster than normal ext4. Unlink all the
>> directories and files is 29.2% faster as expected.  When number of
>> directories is increased, the performance improvement will be more
>> considerable. More benchmark result will be posted here if necessary,
>> because I need more time to run more test cases.
>   Maybe to get some better idea - could you compare how long does take
> untar of a kernel tree, find through the whole kernel tree and removal
> of the tree? Also measuring CPU load during your benchmarks would be
> useful so that we can see whether we don't increase too much the cost of
> searching in bitmaps...

Sure, good ideas, I will add these items to my benchmark list.

> 
>   The patch is nicely short ;)
Thanks for the encouragement.
The first version was more than 2000 lines, including kernel and e2fsprogs patches. Finally I find
only around 100 lines kernel patch can achieve same performance too. Really nice :-)

> 
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index 17b5df1..f838a72 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -10,6 +10,8 @@
>>   *  Stephen Tweedie (sct@redhat.com), 1993
>>   *  Big-endian to little-endian byte-swapping/bitmaps by
>>   *        David S. Miller (davem@caip.rutgers.edu), 1995
>> + *  Directory inodes reservation by
>> + *        Coly Li (coyli@suse.de), 2007
>>   */
>>
>>  #include <linux/time.h>
>> @@ -478,6 +480,75 @@ static int find_group_other(struct super_block *sb, struct inode *parent,
>>  	return -1;
>>  }
>>
>> +static int ext4_ino_from_ireserve(handle_t *handle, struct inode *dir,
>> +			  int mode, ext4_group_t *group, unsigned long *ino)
>> +{
>> +	struct super_block *sb;
>> +	struct ext4_sb_info *sbi;
>> +	struct ext4_group_desc *gdp = NULL;
>> +	struct buffer_head *gdp_bh = NULL, *bitmap_bh = NULL;
>> +	ext4_group_t ires_group = *group;
>> +	unsigned long ires_ino;
>> +	int i, bit;
>> +
>> +	sb = dir->i_sb;
>> +	sbi = EXT4_SB(sb);
>> +
>> +	/* if the inode number is not for directory,
>> +	 * only try to allocate after directory's inode
>> +	 */
>> +	if (!S_ISDIR(mode)) {
>> +		*ino = dir->i_ino % EXT4_INODES_PER_GROUP(sb);
>> +		return 0;
>> +	}
>   ^^^ You don't set a group here - is this intentional? It means we get
> the group from find_group_other() and there we start searching at a
> place corresponding to parent's inode number... That would be an
> interesting heuristic but I'm not sure if that's what you want.

Yes, if allocating a file inode, just return first inode offset in the reserved area of parent
directory. In this case, group is decided by find_group_other() or find_group_orlov(),
ext4_ino_from_ireserve() just tries to persuade linear inode allocator to search free inode slot
after parent's inode.

> 
>> +
>> +	/* reserve inodes for new directory */
>> +	for (i = 0; i < sbi->s_groups_count; i++) {
>> +		gdp = ext4_get_group_desc(sb, ires_group, &gdp_bh);
>> +		if (!gdp)
>> +			goto fail;
>> +		bit = 0;
>> +try_same_group:
>> +		if (bit < EXT4_INODES_PER_GROUP(sb)) {
>> +			brelse(bitmap_bh);
>> +			bitmap_bh = read_inode_bitmap(sb, ires_group);
>> +			if (!bitmap_bh)
>> +				goto fail;
>> +
>> +			BUFFER_TRACE(bitmap_bh, "get_write_access");
>> +			if (ext4_journal_get_write_access(
>> +				handle, bitmap_bh) != 0)
>> +				goto fail;
>> +			if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, ires_group),
>> +					bit, bitmap_bh->b_data)) {
>> +				/* we won it */
>> +				BUFFER_TRACE(bitmap_bh,
>> +					"call ext4_journal_dirty_metadata");
>> +				if (ext4_journal_dirty_metadata(handle,
>> +							bitmap_bh) != 0)
>> +					goto fail;
>> +				ires_ino = bit;
>> +				goto find;
>> +			}
>> +			/* we lost it */
>> +			jbd2_journal_release_buffer(handle, bitmap_bh);
>> +			bit += sbi->s_dir_ireserve_nr;
>> +			goto try_same_group;
>> +		}
>      So this above is just a while loop coded with goto... While loop
> would be IMO better.

The only reason for me to use a goto, is 80 column limitation :) BTW, goto does not hurt performance
and readability here. IMHO, it's acceptable :-)

> 
> 									Honza

Today I finished V4 patch, which fixs  a bug and provides new mount option to configure reserved
inodes number. It can be faster on creating empty directories, I will release it once I finished
basic benchmark. Your advice on benchmark is valuable, I will post the benchmark result next week.

Best regards.

-- 
Coly Li
SuSE PRC Labs

  reply	other threads:[~2007-11-20 16:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-13 14:12 [PATCH] ext4: dir inode reservation V3 Coly Li
2007-11-13 14:09 ` Alex Tomas
2007-11-13 16:27   ` Coly Li
2007-11-13 16:43     ` Coly Li
2007-11-20  2:01 ` Mingming Cao
2007-11-20  4:14   ` Coly Li
2007-11-20 20:22     ` Mingming Cao
2007-11-21  2:09       ` Andreas Dilger
2007-11-20 15:58 ` Jan Kara
2007-11-20 16:40   ` Coly Li [this message]
2007-11-20 16:44     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47430DF1.8090808@suse.de \
    --to=coyli@suse.de \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox