linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Lukas Czerner <lczerner@redhat.com>
Cc: linux-ext4@vger.kernel.org, jmoyer@redhat.com,
	rwheeler@redhat.com, eshishki@redhat.com, sandeen@redhat.com,
	jack@suse.cz, tytso@mit.edu
Subject: Re: [PATCH 2/3] Add batched discard support for ext3
Date: Tue, 27 Jul 2010 17:43:47 +0200	[thread overview]
Message-ID: <20100727154346.GB6820@quack.suse.cz> (raw)
In-Reply-To: <1280234514-10287-3-git-send-email-lczerner@redhat.com>

On Tue 27-07-10 14:41:53, Lukas Czerner wrote:
> Walk through each allocation group and trim all free extents. It can be
> invoked through TRIM ioctl on the file system. The main idea is to
> provide a way to trim the whole file system if needed, since some SSD's
> may suffer from performance loss after the whole device was filled (it
> does not mean that fs is full!).
> 
> It search for free extents in each allocation group. When the free
> extent is found, blocks are marked as used and then trimmed. Afterwards
> these blocks are marked as free in per-group bitmap.
  Thanks for the patch. It looks much better than the previous one. Some
comments below.

> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
>  fs/ext3/balloc.c        |  229 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ext3/super.c         |    1 +
>  include/linux/ext3_fs.h |    1 +
>  3 files changed, 231 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
> index a177122..97c6867 100644
> --- a/fs/ext3/balloc.c
> +++ b/fs/ext3/balloc.c
> @@ -20,6 +20,7 @@
>  #include <linux/ext3_jbd.h>
>  #include <linux/quotaops.h>
>  #include <linux/buffer_head.h>
> +#include <linux/blkdev.h>
>  
>  /*
>   * balloc.c contains the blocks allocation and deallocation routines
> @@ -1876,3 +1877,231 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
>  	return ext3_bg_num_gdb_meta(sb,group);
>  
>  }
> +
> +/**
> + * ext3_trim_all_free -- function to trim all free space in alloc. group
> + * @sb:			super block for file system
> + * @group:		allocation group to trim
> + * @gdp:		allocation group description structure
> + * @minblocks:		minimum extent block count
> + *
> + * ext3_trim_all_free walks through group's block bitmap searching for free
> + * blocks. When the free block is found, it tries to allocate this block and
> + * consequent free block to get the biggest free extent possible, until it
> + * reaches any used block. Then issue a TRIM command on this extent and free
> + * the extent in the block bitmap. This is done until whole group is scanned.
> + */
> +ext3_grpblk_t ext3_trim_all_free(struct super_block *sb,
> +			unsigned int group, ext3_grpblk_t minblocks)
> +{
> +	handle_t *handle;
> +	ext3_grpblk_t max = EXT3_BLOCKS_PER_GROUP(sb);
> +	ext3_grpblk_t next, count = 0, start, bit;
> +	struct ext3_sb_info *sbi;
> +	ext3_fsblk_t discard_block;
> +	struct buffer_head *bitmap_bh = NULL;
> +	struct buffer_head *gdp_bh;
> +	ext3_grpblk_t free_blocks;
> +	struct ext3_group_desc *gdp;
> +	int err = 0, ret;
> +	ext3_grpblk_t freed;
> +
> +	/**
> +	 * We will update one block bitmap, and one group descriptor
> +	 */
> +	handle = ext3_journal_start_sb(sb, 2);
> +	if (IS_ERR(handle)) {
> +		err = PTR_ERR(handle);
> +		ext3_warning(sb, __func__, "error %d on journal start", err);
  I think there's no need to issue another warning here... When
journal_start fails there should be notices about this in the log already.

> +		return err;
> +	}
> +
> +	bitmap_bh = read_block_bitmap(sb, group);
> +	if (!bitmap_bh)
> +		goto err_out;
> +
> +	BUFFER_TRACE(bitmap_bh, "getting undo access");
> +	err = ext3_journal_get_undo_access(handle, bitmap_bh);
> +	if (err)
> +		goto err_out;
> +
> +	gdp = ext3_get_group_desc(sb, group, &gdp_bh);
> +	if (!gdp)
> +		goto err_out;
> +
> +	BUFFER_TRACE(gd_bh, "get_write_access");
> +	err = ext3_journal_get_write_access(handle, gdp_bh);
> +	if (err)
> +		goto err_out;
> +
> +	free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
> +	sbi = EXT3_SB(sb);
> +
> +	 /* Walk through the whole group */
> +	start = 0;
> +	while (start < max) {
> +
> +		start = bitmap_search_next_usable_block(start, bitmap_bh, max);
> +		if (start < 0)
> +			break;
> +		next = start;
> +
> +		/**
                ^^^ Use just /*. /** is reserved for comments processed by
  kernel documentation generator and they have to have special format...

> +		 * Allocate contiguous free extents by setting bits in the
> +		 * block bitmap
> +		 */
> +		while (next < max
> +			&& claim_block(sb_bgl_lock(sbi, group),
> +					next, bitmap_bh)) {
> +			next++;
> +		}
> +
> +		 /* We did not claimed any blocks */
                               ^^^^ claim

> +		if (next == start)
> +			continue;
> +
> +		discard_block = (ext3_fsblk_t)start +
> +				ext3_group_first_block_no(sb, group);
> +
> +		/* Update counters */
> +		spin_lock(sb_bgl_lock(sbi, group));
> +		le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
> +		spin_unlock(sb_bgl_lock(sbi, group));
> +		percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
> +
> +		/* Do not issue a TRIM on extents smaller than minblocks */
> +		if ((next - start) < minblocks)
> +			goto free_extent;
> +
> +		 /* Send the TRIM command down to the device */
> +		sb_issue_discard(sb, discard_block, next - start);
> +		count += (next - start);
> +		cond_resched();
  Maybe you could cond_resched() after freeing the blocks instead of here?
So that we don't sleep with blocks unnecessarily allocated?

> +
> +free_extent:
> +
> +		freed = 0;
> +		jbd_lock_bh_state(bitmap_bh);
> +
> +		for (bit = start; bit < next; bit++) {
> +
> +			/**
> +			 * @@@ This prevents newly-allocated data from being
> +			 * freed and then reallocated within the same
> +			 * transaction.
> +			 */
> +			BUFFER_TRACE(bitmap_bh, "set in b_committed_data");
> +			J_ASSERT_BH(bitmap_bh,
> +				bh2jh(bitmap_bh)->b_committed_data != NULL);
> +			ext3_set_bit_atomic(sb_bgl_lock(sbi, group), bit,
> +				bh2jh(bitmap_bh)->b_committed_data);
  You don't have to do this. Since you didn't really allocate the blocks
(you are actually never going to commit bitmap changes to the journal)
blocks can be happily reallocated just after you clear the bits in the
bitmap. There's no problem with that. Also you don't have to hold bh_state
lock at all.

> +
> +			/**
> +			 * We clear the bit in the bitmap after setting the
> +			 * committed data bit, because this is the reverse
> +			 * order to that which the allocator uses.
> +			 */
> +			BUFFER_TRACE(bitmap_bh, "clear bit");
> +			if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
> +						bit, bitmap_bh->b_data)) {
> +				jbd_unlock_bh_state(bitmap_bh);
> +				ext3_error(sb, __func__,
> +					"bit already cleared for block "E3FSBLK,
> +					 (unsigned long)bit);
> +				jbd_lock_bh_state(bitmap_bh);
> +				BUFFER_TRACE(bitmap_bh, "bit already cleared");
> +			} else {
> +				freed++;
> +			}
> +		}
> +
> +		jbd_unlock_bh_state(bitmap_bh);
> +
> +		/* Update couters */
> +		spin_lock(sb_bgl_lock(sbi, group));
> +		le16_add_cpu(&gdp->bg_free_blocks_count, freed);
> +		spin_unlock(sb_bgl_lock(sbi, group));
> +		percpu_counter_add(&sbi->s_freeblocks_counter, next - start);
> +
> +		start = next;
> +
> +		if (signal_pending(current)) {
> +			count = -ERESTARTSYS;
> +			break;
> +		}
> +
> +		/* No more suitable extents */
> +		if ((free_blocks - count) < minblocks)
> +			break;
> +	}
> +
> +	/* We dirtied the bitmap block */
> +	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> +	err = ext3_journal_dirty_metadata(handle, bitmap_bh);
> +
> +	/* And the group descriptor block */
> +	BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
> +	ret = ext3_journal_dirty_metadata(handle, gdp_bh);
> +	if (!err)
> +		err = ret;
> +
> +	ext3_debug("trimmed %d blocks in the group %d\n",
> +		count, group);
> +
> +err_out:
> +	if (err) {
> +		ext3_std_error(sb, err);
> +		count = -err;
> +	}
> +
> +	ext3_journal_stop(handle);
> +	brelse(bitmap_bh);
> +
> +	return count;
> +}
  Otherwise the patch looks OK.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2010-07-27 15:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-27 12:41 [PATCH 0/3 v3] Batched discard support for Ext3/Ext4 Lukas Czerner
2010-07-27 12:41 ` [PATCH 1/3] Add ioctl FITRIM Lukas Czerner
2010-07-27 12:41 ` [PATCH 2/3] Add batched discard support for ext3 Lukas Czerner
2010-07-27 15:43   ` Jan Kara [this message]
2010-07-28  9:13     ` Lukas Czerner
2010-07-27 12:41 ` [PATCH 3/3] Add batched discard support for ext4 Lukas Czerner
2010-07-27 16:28   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2010-08-04 13:44 [PATCH 1/3] Add ioctl FITRIM Lukas Czerner
2010-08-04 13:44 ` [PATCH 2/3] Add batched discard support for ext3 Lukas Czerner
2010-08-04 14:03   ` Jan Kara
2010-08-04 14:32     ` Lukas Czerner
2010-08-04 19:39   ` Andreas Dilger
2010-08-05 14:00     ` Lukas Czerner
2010-08-06 11:31 [PATCH 0/3] Batched discard support Lukas Czerner
2010-08-06 11:31 ` [PATCH 2/3] Add batched discard support for ext3 Lukas Czerner
2010-08-10 14:19 [PATCH 0/3 ver. 7] Ext3/Ext4 Batched discard support Lukas Czerner
2010-08-10 14:19 ` [PATCH 2/3] Add batched discard support for ext3 Lukas Czerner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100727154346.GB6820@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=eshishki@redhat.com \
    --cc=jmoyer@redhat.com \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=rwheeler@redhat.com \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).