Re: [PATCH 3/3 v5] ext4: reduce lock contention in __ext4_new_inode

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Wang Shilong <wangshilong1991@gmail.com>
Cc: linux-ext4@vger.kernel.org, sihara@ddn.com, tytso@mit.edu,
	adilger@dilger.ca, lixi@ddn.com, wshilong@ddn.com, jack@suse.cz
Subject: Re: [PATCH 3/3 v5] ext4: reduce lock contention in __ext4_new_inode
Date: Mon, 21 Aug 2017 10:02:20 +0200	[thread overview]
Message-ID: <20170821080220.GA19196@quack2.suse.cz> (raw)
In-Reply-To: <20170820115437.57074-1-wshilong@ddn.com>

On Sun 20-08-17 19:54:37, Wang Shilong wrote:
> From: Wang Shilong <wshilong@ddn.com>
> 
> While running number of creating file threads concurrently,
> we found heavy lock contention on group spinlock:
> 
> FUNC                           TOTAL_TIME(us)       COUNT        AVG(us)
> ext4_create                    1707443399           1440000      1185.72
> _raw_spin_lock                 1317641501           180899929    7.28
> jbd2__journal_start            287821030            1453950      197.96
> jbd2_journal_get_write_access  33441470             73077185     0.46
> ext4_add_nondir                29435963             1440000      20.44
> ext4_add_entry                 26015166             1440049      18.07
> ext4_dx_add_entry              25729337             1432814      17.96
> ext4_mark_inode_dirty          12302433             5774407      2.13
> 
> most of cpu time blames to _raw_spin_lock, here is some testing
> numbers with/without patch.
> 
> Test environment:
> Server : SuperMicro Sever (2 x E5-2690 v3@2.60GHz, 128GB 2133MHz
>          DDR4 Memory, 8GbFC)
> Storage : 2 x RAID1 (DDN SFA7700X, 4 x Toshiba PX02SMU020 200GB
>           Read Intensive SSD)
> 
> format command:
>         mkfs.ext4 -J size=4096
> 
> test command:
>         mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
>                 -r -i 1 -v -p 10 -u #first run to load inode
> 
>         mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
>                 -r -i 3 -v -p 10 -u
> 
> Kernel version: 4.13.0-rc3
> 
> Test  1,440,000 files with 48 directories by 48 processes:
> 
> Without patch:
> 
> File Creation   File removal
> 79,033          289,569 ops/per second
> 81,463          285,359
> 79,875          288,475
> 
> With patch:
> File Creation   File removal
> 810669		301694
> 812805		302711
> 813965		297670
> 
> Creation performance is improved more than 10X with large
> journal size. The main problem here is we test bitmap
> and do some check and journal operations which could be
> slept, then we test and set with lock hold, this could
> be racy, and make 'inode' steal by other process.
> 
> However, after first try, we could confirm handle has
> been started and inode bitmap journaled too, then
> we could find and set bit with lock hold directly, this
> will mostly gurateee success with second try.
> 
> Tested-by: Shuichi Ihara <sihara@ddn.com>
> Signed-off-by: Wang Shilong <wshilong@ddn.com>

I like the result! You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> v4->v5: better refactor codes as Jan Kara suggested.
> ---
>  fs/ext4/ialloc.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 9e6eb1c..fb83a36 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -730,6 +730,27 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
>  	return ret;
>  }
>  
> +static int find_inode_bit(struct super_block *sb, ext4_group_t group,
> +			  struct buffer_head *bitmap, unsigned long *ino)
> +{
> +next:
> +	*ino = ext4_find_next_zero_bit((unsigned long *)
> +				       bitmap->b_data,
> +				       EXT4_INODES_PER_GROUP(sb), *ino);
> +	if (*ino >= EXT4_INODES_PER_GROUP(sb))
> +		return 0;
> +
> +	if ((EXT4_SB(sb)->s_journal == NULL) &&
> +	    recently_deleted(sb, group, *ino)) {
> +		*ino = *ino + 1;
> +		if (*ino < EXT4_INODES_PER_GROUP(sb))
> +			goto next;
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
>  /*
>   * There are two policies for allocating an inode.  If the new inode is
>   * a directory, then a forward search is made for a block group with both
> @@ -910,21 +931,16 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
>  		}
>  
>  repeat_in_this_group:
> -		ino = ext4_find_next_zero_bit((unsigned long *)
> -					      inode_bitmap_bh->b_data,
> -					      EXT4_INODES_PER_GROUP(sb), ino);
> -		if (ino >= EXT4_INODES_PER_GROUP(sb))
> +		ret2 = find_inode_bit(sb, group, inode_bitmap_bh, &ino);
> +		if (!ret2)
>  			goto next_group;
> -		if (group == 0 && (ino+1) < EXT4_FIRST_INO(sb)) {
> +
> +		if (group == 0 && (ino + 1) < EXT4_FIRST_INO(sb)) {
>  			ext4_error(sb, "reserved inode found cleared - "
>  				   "inode=%lu", ino + 1);
>  			goto next_group;
>  		}
> -		if ((EXT4_SB(sb)->s_journal == NULL) &&
> -		    recently_deleted(sb, group, ino)) {
> -			ino++;
> -			goto next_inode;
> -		}
> +
>  		if (!handle) {
>  			BUG_ON(nblocks <= 0);
>  			handle = __ext4_journal_start_sb(dir->i_sb, line_no,
> @@ -944,11 +960,23 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
>  		}
>  		ext4_lock_group(sb, group);
>  		ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
> +		if (ret2) {
> +			/* Someone already took the bit. Repeat the search
> +			 * with lock held.
> +			 */
> +			ret2 = find_inode_bit(sb, group, inode_bitmap_bh, &ino);
> +			if (ret2) {
> +				ext4_set_bit(ino, inode_bitmap_bh->b_data);
> +				ret2 = 0;
> +			} else {
> +				ret2 = 1; /* we didn't grab the inode */
> +			}
> +		}
>  		ext4_unlock_group(sb, group);
>  		ino++;		/* the inode bitmap is zero-based */
>  		if (!ret2)
>  			goto got; /* we grabbed the inode! */
> -next_inode:
> +
>  		if (ino < EXT4_INODES_PER_GROUP(sb))
>  			goto repeat_in_this_group;
>  next_group:
> -- 
> 2.9.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2017-08-21  8:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-20 11:54 [PATCH 3/3 v5] ext4: reduce lock contention in __ext4_new_inode Wang Shilong
2017-08-21  8:02 ` Jan Kara [this message]
2017-08-24 17:07   ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170821080220.GA19196@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=lixi@ddn.com \
    --cc=sihara@ddn.com \
    --cc=tytso@mit.edu \
    --cc=wangshilong1991@gmail.com \
    --cc=wshilong@ddn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).