linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support
@ 2010-09-27 14:09 Lukas Czerner
  2010-09-27 14:09 ` [PATCH 1/4] ext4: Use return value from sb_issue_discard() Lukas Czerner
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-09-27 14:09 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, rwheeler, sandeen, adilger, lczerner

Hi,

just some minor changes here. As Andreas pointed out it is better to
have special structure to pass FITRIM arguments into ioctl, instead
os just an array of "noname" elements. So I have introduced this
structure:

struct fstrim_range {
	uint64_t start;
	uint64_t len;
	uint64_t minlen;
};

Also the tool which uses FSTRIM ioctl gets updated.


SHORT DESCRIPTION:
==================

Batched discard adds ability to discard free space on mounded filesystem,
in order to avoid using current discard implementation which discards
recently freed blocks. This approach may on some devices (it depends on
how efficient is the device wear-leveling algorithm) result in huge
performance loss.

Batched discard can be invoked from user-space through FITRIM ioctl on
the whole, or just a part, of file system. With this approach we are
searching for continuous free blocks bigger than defined through ioctl
to discard them. So, since we are searching for big continuous extents
it is much more efficient than current approach and it gives user fine
grained control over how much disk space will be reclaimed for
wear-leveling and what impact will it have on performance.


I have attached source code for example application which uses FITRIM
to discard just a part or whole filesystem. Since FITRIM is filesystem
independent ioctl it can be used by any filesystem which supports it.

Usage: fstrim [-s start] [-l length] [-m minimum-extent] [-v] {mountpoint}
        -s Starting Byte to discard from
        -l Number of Bytes to discard from the start
        -m Minimum extent length to discard
        -v Verbose - number of discarded bytes

---
bd6a5a3 ext3: Add batched discard support for ext3
9dcabb2 ext4: Add batched discard support for ext4
9c8c3a5 Add ioctl FITRIM.
787dbea ext4: Use return value from sb_issue_discard()

 fs/ext3/balloc.c        |  256 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext3/super.c         |    1 +
 fs/ext4/ext4.h          |    2 +
 fs/ext4/mballoc.c       |  194 +++++++++++++++++++++++++++++++++++-
 fs/ext4/super.c         |    1 +
 fs/ioctl.c              |   39 +++++++
 include/linux/ext3_fs.h |    1 +
 include/linux/fs.h      |    8 ++
 8 files changed, 501 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] ext4: Use return value from sb_issue_discard()
  2010-09-27 14:09 [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
@ 2010-09-27 14:09 ` Lukas Czerner
  2010-09-27 14:09 ` [PATCH 2/4] Add ioctl FITRIM Lukas Czerner
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-09-27 14:09 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, rwheeler, sandeen, adilger, lczerner

Use return value from sb_issue_discard() as return value in
ext4_issue_discard(). Since sb_issue_discard() may result in more
serious errors than just -EOPNOTSUPP it is worth to inform user of this
function about them to handle error cases properly.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/mballoc.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 66c3535..93eb6c2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2603,7 +2603,7 @@ int ext4_mb_release(struct super_block *sb)
 	return 0;
 }
 
-static inline void ext4_issue_discard(struct super_block *sb,
+static inline int ext4_issue_discard(struct super_block *sb,
 		ext4_group_t block_group, ext4_grpblk_t block, int count)
 {
 	int ret;
@@ -2617,6 +2617,7 @@ static inline void ext4_issue_discard(struct super_block *sb,
 		ext4_warning(sb, "discard not supported, disabling");
 		clear_opt(EXT4_SB(sb)->s_mount_opt, DISCARD);
 	}
+	return ret;
 }
 
 /*
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] Add ioctl FITRIM.
  2010-09-27 14:09 [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
  2010-09-27 14:09 ` [PATCH 1/4] ext4: Use return value from sb_issue_discard() Lukas Czerner
@ 2010-09-27 14:09 ` Lukas Czerner
  2010-09-27 14:09 ` [PATCH 3/4] ext4: Add batched discard support for ext4 Lukas Czerner
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-09-27 14:09 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, rwheeler, sandeen, adilger, lczerner

Adds an filesystem independent ioctl to allow implementation of file
system batched discard support. I takes fstrim_range structure as an
argument. fstrim_range is definec in the include/fs.h and its
definition is as follows.

struct fstrim_range {
	start;
	len;
	minlen;
}

start	- first Byte to trim
len	- number of Bytes to trim from start
minlen	- minimum extent length to trim, free extents shorter than this
	  number of Bytes will be ignored. This will be rounded up to fs
	  block size.

It is also possible to specify NULL as an argument. In this case the
arguments will set itself as follows:

start = 0;
len = ULLONG_MAX;
minlen = 0;

So it will trim the whole file system at one run.

After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ioctl.c         |   39 +++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |    8 ++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index f855ea4..e92fdbb 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -530,6 +530,41 @@ static int ioctl_fsthaw(struct file *filp)
 	return thaw_super(sb);
 }
 
+static int ioctl_fstrim(struct file *filp, void __user *argp)
+{
+	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
+	struct fstrim_range range;
+	int ret = 0;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* If filesystem doesn't support trim feature, return. */
+	if (sb->s_op->trim_fs == NULL)
+		return -EOPNOTSUPP;
+
+	/* If a blockdevice-backed filesystem isn't specified, return EINVAL. */
+	if (sb->s_bdev == NULL)
+		return -EINVAL;
+
+	if (argp == NULL) {
+		range.start = 0;
+		range.len = ULLONG_MAX;
+		range.minlen = 0;
+	} else if (copy_from_user(&range, argp, sizeof(range)))
+		return -EFAULT;
+
+	ret = sb->s_op->trim_fs(sb, &range);
+	if (ret < 0)
+		return ret;
+
+	if ((argp != NULL) &&
+	    (copy_to_user(argp, &range, sizeof(range))))
+		return -EFAULT;
+
+	return 0;
+}
+
 /*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
@@ -580,6 +615,10 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
 		error = ioctl_fsthaw(filp);
 		break;
 
+	case FITRIM:
+		error = ioctl_fstrim(filp, argp);
+		break;
+
 	case FS_IOC_FIEMAP:
 		return ioctl_fiemap(filp, arg);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0ec4d60e..63a0843 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -32,6 +32,12 @@
 #define SEEK_END	2	/* seek relative to end of file */
 #define SEEK_MAX	SEEK_END
 
+struct fstrim_range {
+	uint64_t start;
+	uint64_t len;
+	uint64_t minlen;
+};
+
 /* And dynamically-tunable limits and defaults: */
 struct files_stat_struct {
 	int nr_files;		/* read only */
@@ -312,6 +318,7 @@ struct inodes_stat_t {
 #define FIGETBSZ   _IO(0x00,2)	/* get the block size used for bmap */
 #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
 #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
+#define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
 
 #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
 #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
@@ -1573,6 +1580,7 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+	int (*trim_fs) (struct super_block *, struct fstrim_range *);
 };
 
 /*
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] ext4: Add batched discard support for ext4
  2010-09-27 14:09 [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
  2010-09-27 14:09 ` [PATCH 1/4] ext4: Use return value from sb_issue_discard() Lukas Czerner
  2010-09-27 14:09 ` [PATCH 2/4] Add ioctl FITRIM Lukas Czerner
@ 2010-09-27 14:09 ` Lukas Czerner
  2010-10-25 18:50   ` Ted Ts'o
  2010-09-27 14:10 ` [PATCH 4/4] ext3: Add batched discard support for ext3 Lukas Czerner
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Lukas Czerner @ 2010-09-27 14:09 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, rwheeler, sandeen, adilger, lczerner, Dmitry Monakhov

Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).

It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks
are marked as used and then trimmed. Afterwards these blocks are marked
as free in per-group bitmap.

Since fstrim is a long operation it is good to have an ability to
interrupt it by a signal. This was added by Dmitry Monakhov.
Thanks Dimitry.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/ext4.h    |    2 +
 fs/ext4/mballoc.c |  191 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/super.c   |    1 +
 3 files changed, 194 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b364b9d..1d0bb44 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1631,6 +1631,8 @@ extern int ext4_mb_add_groupinfo(struct super_block *sb,
 extern int ext4_mb_get_buddy_cache_lock(struct super_block *, ext4_group_t);
 extern void ext4_mb_put_buddy_cache_lock(struct super_block *,
 						ext4_group_t, int);
+extern int ext4_trim_fs(struct super_block *, struct fstrim_range *);
+
 /* inode.c */
 struct buffer_head *ext4_getblk(handle_t *, struct inode *,
 						ext4_lblk_t, int, int *);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 93eb6c2..80a5139 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4746,3 +4746,194 @@ error_return:
 		kmem_cache_free(ext4_ac_cachep, ac);
 	return;
 }
+
+/**
+ * ext4_trim_extent -- function to TRIM one single free extent in the group
+ * @sb:		super block for the file system
+ * @start:	starting block of the free extent in the alloc. group
+ * @count:	number of blocks to TRIM
+ * @group:	alloc. group we are working with
+ * @e4b:	ext4 buddy for the group
+ *
+ * Trim "count" blocks starting at "start" in the "group". To assure that no
+ * one will allocate those blocks, mark it as used in buddy bitmap. This must
+ * be called with under the group lock.
+ */
+static int ext4_trim_extent(struct super_block *sb, int start, int count,
+		ext4_group_t group, struct ext4_buddy *e4b)
+{
+	struct ext4_free_extent ex;
+	int ret = 0;
+
+	assert_spin_locked(ext4_group_lock_ptr(sb, group));
+
+	ex.fe_start = start;
+	ex.fe_group = group;
+	ex.fe_len = count;
+
+	/*
+	 * Mark blocks used, so no one can reuse them while
+	 * being trimmed.
+	 */
+	mb_mark_used(e4b, &ex);
+	ext4_unlock_group(sb, group);
+
+	ret = ext4_issue_discard(sb, group, start, count);
+	if (ret)
+		ext4_std_error(sb, ret);
+
+	ext4_lock_group(sb, group);
+	mb_free_blocks(NULL, e4b, start, ex.fe_len);
+	return ret;
+}
+
+/**
+ * ext4_trim_all_free -- function to trim all free space in alloc. group
+ * @sb:			super block for file system
+ * @e4b:		ext4 buddy
+ * @start:		first group block to examine
+ * @max:		last group block to examine
+ * @minblocks:		minimum extent block count
+ *
+ * ext4_trim_all_free walks through group's buddy bitmap searching for free
+ * extents. When the free block is found, ext4_trim_extent is called to TRIM
+ * the extent.
+ *
+ *
+ * ext4_trim_all_free walks through group's block bitmap searching for free
+ * extents. When the free extent is found, mark it as used in group buddy
+ * bitmap. Then issue a TRIM command on this extent and free the extent in
+ * the group buddy bitmap. This is done until whole group is scanned.
+ */
+ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
+		ext4_grpblk_t start, ext4_grpblk_t max, ext4_grpblk_t minblocks)
+{
+	void *bitmap;
+	ext4_grpblk_t next, count = 0;
+	ext4_group_t group;
+	int ret = 0;
+
+	BUG_ON(e4b == NULL);
+
+	bitmap = e4b->bd_bitmap;
+	group = e4b->bd_group;
+	start = (e4b->bd_info->bb_first_free > start) ?
+		e4b->bd_info->bb_first_free : start;
+	ext4_lock_group(sb, group);
+
+	while (start < max) {
+
+		start = mb_find_next_zero_bit(bitmap, max, start);
+		if (start >= max)
+			break;
+		next = mb_find_next_bit(bitmap, max, start);
+
+		if ((next - start) >= minblocks) {
+			ret = ext4_trim_extent(sb, start,
+				next - start, group, e4b);
+			if (ret < 0)
+				break;
+			count += next - start;
+		}
+		start = next + 1;
+
+		if (fatal_signal_pending(current)) {
+			count = -ERESTARTSYS;
+			break;
+		}
+
+		if (need_resched()) {
+			ext4_unlock_group(sb, group);
+			cond_resched();
+			ext4_lock_group(sb, group);
+		}
+
+		if ((e4b->bd_info->bb_free - count) < minblocks)
+			break;
+	}
+	ext4_unlock_group(sb, group);
+
+	ext4_debug("trimmed %d blocks in the group %d\n",
+		count, group);
+
+	if (ret < 0)
+		count = ret;
+
+	return count;
+}
+
+/**
+ * ext4_trim_fs() -- trim ioctl handle function
+ * @sb:			superblock for filesystem
+ * @range:		fstrim_range structure
+ *
+ * start:	First Byte to trim
+ * len:		number of Bytes to trim from start
+ * minlen:	minimum extent length in Bytes
+ * ext4_trim_fs goes through all allocation groups containing Bytes from
+ * start to start+len. For each such a group ext4_trim_all_free function
+ * is invoked to trim all free space.
+ */
+int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
+{
+	struct ext4_buddy e4b;
+	ext4_fsblk_t first_group, last_group;
+	ext4_group_t group, ngroups = ext4_get_groups_count(sb);
+	ext4_grpblk_t cnt = 0, first_block, last_block;
+	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+	uint64_t start, len, minlen, trimmed;
+	int ret = 0;
+
+	start = range->start >> sb->s_blocksize_bits;
+	len = range->len >> sb->s_blocksize_bits;
+	minlen = range->minlen >> sb->s_blocksize_bits;
+	trimmed = 0;
+
+	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
+		return -EINVAL;
+
+	/* Determine first and last group to examine based on start and len */
+	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
+		      EXT4_BLOCKS_PER_GROUP(sb);
+	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
+		     EXT4_BLOCKS_PER_GROUP(sb);
+	last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
+
+	if (first_group > last_group)
+		return -EINVAL;
+
+	first_block = (start - le32_to_cpu(es->s_first_data_block)) %
+			EXT4_BLOCKS_PER_GROUP(sb);
+	last_block = EXT4_BLOCKS_PER_GROUP(sb);
+
+	for (group = first_group; group <= last_group; group++) {
+
+		ret = ext4_mb_load_buddy(sb, group, &e4b);
+		if (ret) {
+			ext4_error(sb, "Error in loading buddy "
+					"information for %u", group);
+			break;
+		}
+
+		if (len >= EXT4_BLOCKS_PER_GROUP(sb))
+			len -= (EXT4_BLOCKS_PER_GROUP(sb) - first_block);
+		else
+			last_block = len;
+
+		if (e4b.bd_info->bb_free >= minlen) {
+			cnt = ext4_trim_all_free(sb, &e4b, first_block,
+						last_block, minlen);
+			if (cnt < 0) {
+				ret = cnt;
+				ext4_mb_unload_buddy(&e4b);
+				break;
+			}
+		}
+		ext4_mb_unload_buddy(&e4b);
+		trimmed += cnt;
+		first_block = 0;
+	}
+	range->len = trimmed * sb->s_blocksize;
+
+	return ret;
+}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9134abf..9b67155 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1171,6 +1171,7 @@ static const struct super_operations ext4_sops = {
 	.quota_write	= ext4_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.trim_fs	= ext4_trim_fs
 };
 
 static const struct super_operations ext4_nojournal_sops = {
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] ext3: Add batched discard support for ext3
  2010-09-27 14:09 [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
                   ` (2 preceding siblings ...)
  2010-09-27 14:09 ` [PATCH 3/4] ext4: Add batched discard support for ext4 Lukas Czerner
@ 2010-09-27 14:10 ` Lukas Czerner
  2010-09-27 14:11 ` [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support - fstrim Lukas Czerner
  2010-10-11 17:02 ` [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
  5 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-09-27 14:10 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, rwheeler, sandeen, adilger, lczerner

Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).

It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks are
marked as used and then trimmed. Afterwards these blocks are marked as
free in per-group bitmap.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext3/balloc.c        |  256 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext3/super.c         |    1 +
 include/linux/ext3_fs.h |    1 +
 3 files changed, 258 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
index 4a32511..11fb0b2 100644
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
@@ -20,6 +20,7 @@
 #include <linux/ext3_jbd.h>
 #include <linux/quotaops.h>
 #include <linux/buffer_head.h>
+#include <linux/blkdev.h>
 
 /*
  * balloc.c contains the blocks allocation and deallocation routines
@@ -1882,3 +1883,258 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
 	return ext3_bg_num_gdb_meta(sb,group);
 
 }
+
+/**
+ * ext3_trim_all_free -- function to trim all free space in alloc. group
+ * @sb:			super block for file system
+ * @group:		allocation group to trim
+ * @start:		first group block to examine
+ * @max:		last group block to examine
+ * @gdp:		allocation group description structure
+ * @minblocks:		minimum extent block count
+ *
+ * ext3_trim_all_free walks through group's block bitmap searching for free
+ * blocks. When the free block is found, it tries to allocate this block and
+ * consequent free block to get the biggest free extent possible, until it
+ * reaches any used block. Then issue a TRIM command on this extent and free
+ * the extent in the block bitmap. This is done until whole group is scanned.
+ */
+ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
+				ext3_grpblk_t start, ext3_grpblk_t max,
+				ext3_grpblk_t minblocks)
+{
+	handle_t *handle;
+	ext3_grpblk_t next, count = 0, bit;
+	struct ext3_sb_info *sbi;
+	ext3_fsblk_t discard_block;
+	struct buffer_head *bitmap_bh = NULL;
+	struct buffer_head *gdp_bh;
+	ext3_grpblk_t free_blocks;
+	struct ext3_group_desc *gdp;
+	int err = 0, ret = 0;
+	ext3_grpblk_t freed;
+
+	/*
+	 * We will update one block bitmap, and one group descriptor
+	 */
+	handle = ext3_journal_start_sb(sb, 2);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		return err;
+	}
+
+	bitmap_bh = read_block_bitmap(sb, group);
+	if (!bitmap_bh)
+		goto err_out;
+
+	BUFFER_TRACE(bitmap_bh, "getting undo access");
+	err = ext3_journal_get_undo_access(handle, bitmap_bh);
+	if (err)
+		goto err_out;
+
+	gdp = ext3_get_group_desc(sb, group, &gdp_bh);
+	if (!gdp)
+		goto err_out;
+
+	BUFFER_TRACE(gdp_bh, "get_write_access");
+	err = ext3_journal_get_write_access(handle, gdp_bh);
+	if (err)
+		goto err_out;
+
+	free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+	sbi = EXT3_SB(sb);
+
+	 /* Walk through the whole group */
+	while (start < max) {
+
+		start = bitmap_search_next_usable_block(start, bitmap_bh, max);
+		if (start < 0)
+			break;
+		next = start;
+
+		/*
+		 * Allocate contiguous free extents by setting bits in the
+		 * block bitmap
+		 */
+		while (next < max
+			&& claim_block(sb_bgl_lock(sbi, group),
+					next, bitmap_bh)) {
+			next++;
+		}
+
+		 /* We did not claim any blocks */
+		if (next == start)
+			continue;
+
+		discard_block = (ext3_fsblk_t)start +
+				ext3_group_first_block_no(sb, group);
+
+		/* Update counters */
+		spin_lock(sb_bgl_lock(sbi, group));
+		le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
+		spin_unlock(sb_bgl_lock(sbi, group));
+		percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
+
+		/* Do not issue a TRIM on extents smaller than minblocks */
+		if ((next - start) < minblocks)
+			goto free_extent;
+
+		 /* Send the TRIM command down to the device */
+		ret = sb_issue_discard(sb, discard_block, next - start,
+				       GFP_NOFS, 0);
+		count += (next - start);
+
+free_extent:
+		freed = 0;
+
+		/*
+		 * Clear bits in the bitmap
+		 */
+		for (bit = start; bit < next; bit++) {
+			BUFFER_TRACE(bitmap_bh, "clear bit");
+			if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
+						bit, bitmap_bh->b_data)) {
+				ext3_error(sb, __func__,
+					"bit already cleared for block "E3FSBLK,
+					 (unsigned long)bit);
+				BUFFER_TRACE(bitmap_bh, "bit already cleared");
+			} else {
+				freed++;
+			}
+		}
+
+		/* Update couters */
+		spin_lock(sb_bgl_lock(sbi, group));
+		le16_add_cpu(&gdp->bg_free_blocks_count, freed);
+		spin_unlock(sb_bgl_lock(sbi, group));
+		percpu_counter_add(&sbi->s_freeblocks_counter, next - start);
+
+		start = next;
+
+		if (ret < 0) {
+			if (ret == -EOPNOTSUPP) {
+				ext3_warning(sb, __func__,
+					"discard not supported!");
+				count = ret;
+				break;
+			}
+			err = ret;
+			break;
+		}
+
+		if (fatal_signal_pending(current)) {
+			count = -ERESTARTSYS;
+			break;
+		}
+
+		cond_resched();
+
+		/* No more suitable extents */
+		if ((free_blocks - count) < minblocks)
+			break;
+	}
+
+	/* We dirtied the bitmap block */
+	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+	err = ext3_journal_dirty_metadata(handle, bitmap_bh);
+
+	/* And the group descriptor block */
+	BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
+	ret = ext3_journal_dirty_metadata(handle, gdp_bh);
+	if (!err)
+		err = ret;
+
+	ext3_debug("trimmed %d blocks in the group %d\n",
+		count, group);
+
+err_out:
+	if (err) {
+		ext3_std_error(sb, err);
+		count = err;
+	}
+
+	ext3_journal_stop(handle);
+	brelse(bitmap_bh);
+
+	return count;
+}
+
+/**
+ * ext3_trim_fs() -- trim ioctl handle function
+ * @sb:			superblock for filesystem
+ * @start:		First Byte to trim
+ * @len:		number of Bytes to trim from start
+ * @minlen:		minimum extent length in Bytes
+ *
+ * ext3_trim_fs goes through all allocation groups containing Bytes from
+ * start to start+len. For each such a group ext3_trim_all_free function
+ * is invoked to trim all free space.
+ */
+int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range)
+{
+	ext3_grpblk_t last_block, first_block, free_blocks;
+	unsigned long long first_group, last_group;
+	unsigned long group, ngroups;
+	struct ext3_group_desc *gdp;
+	struct ext3_super_block *es;
+	uint64_t start, len, minlen, trimmed;
+	int ret = 0;
+
+	start = range->start >> sb->s_blocksize_bits;
+	len = range->len >> sb->s_blocksize_bits;
+	minlen = range->minlen >> sb->s_blocksize_bits;
+	trimmed = 0;
+
+	if (unlikely(minlen > EXT3_BLOCKS_PER_GROUP(sb)))
+		return -EINVAL;
+
+	es = EXT3_SB(sb)->s_es;
+	ngroups = EXT3_SB(sb)->s_groups_count;
+	smp_rmb();
+
+	/* Determine first and last group to examine based on start and len */
+	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
+		      EXT3_BLOCKS_PER_GROUP(sb);
+	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
+		     EXT3_BLOCKS_PER_GROUP(sb);
+	last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
+
+	if (first_group > last_group)
+		return -EINVAL;
+
+	first_block = (start - le32_to_cpu(es->s_first_data_block)) %
+			EXT3_BLOCKS_PER_GROUP(sb);
+	last_block = EXT3_BLOCKS_PER_GROUP(sb);
+
+	for (group = first_group; group <= last_group; group++) {
+
+		gdp = ext3_get_group_desc(sb, group, NULL);
+		if (!gdp)
+			break;
+
+		free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+		if (free_blocks < minlen)
+			continue;
+
+		if (len >= EXT3_BLOCKS_PER_GROUP(sb))
+			len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
+		else
+			last_block = len;
+
+		ret = ext3_trim_all_free(sb, group, first_block,
+					last_block, minlen);
+		if (ret < 0)
+			break;
+
+		trimmed += ret;
+
+		first_block = 0;
+	}
+
+	if (ret >= 0)
+		ret = 0;
+
+	range->len = trimmed * sb->s_blocksize;
+
+	return ret;
+}
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 3777680..1a62efd 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -777,6 +777,7 @@ static const struct super_operations ext3_sops = {
 	.quota_write	= ext3_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.trim_fs	= ext3_trim_fs,
 };
 
 static const struct export_operations ext3_export_ops = {
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 6ce1bca..a443965 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -856,6 +856,7 @@ extern struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
 extern int ext3_should_retry_alloc(struct super_block *sb, int *retries);
 extern void ext3_init_block_alloc_info(struct inode *);
 extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_window_node *rsv);
+extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
 
 /* dir.c */
 extern int ext3_check_dir_entry(const char *, struct inode *,
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support - fstrim
  2010-09-27 14:09 [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
                   ` (3 preceding siblings ...)
  2010-09-27 14:10 ` [PATCH 4/4] ext3: Add batched discard support for ext3 Lukas Czerner
@ 2010-09-27 14:11 ` Lukas Czerner
  2010-10-11 17:02 ` [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
  5 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-09-27 14:11 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, tytso, rwheeler, sandeen, adilger

/*
 * fstrim.c -- discard the part (or whole) of mounted filesystem.
 *
 * Copyright (C) 2009 Red Hat, Inc., Lukas Czerner <lczerner@redhat.com>
 *
 * %Begin-Header%
 * This file may be redistributed under the terms of the GNU Public
 * License.
 * %End-Header%
 *
 * Usage: fstrim [options] <mount point>
 */

#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <limits.h>
#include <stdarg.h>

#ifdef HAVE_GETOPT_H
#include <getopt.h>
#else
extern char *optarg;
extern int optind;
#endif

#include <sys/ioctl.h>
#include <sys/stat.h>
#include <linux/fs.h>

#ifndef FITRIM
struct fstrim_range {
	uint64_t start;
	uint64_t len;
	uint64_t minlen;
};
#define FITRIM		_IOWR('X', 121, struct fstrim_range)
#endif

const char *program_name = "fstrim";

struct options {
	struct fstrim_range *range;
	char mpoint[PATH_MAX];
	char verbose;
};

static void usage(void)
{
	fprintf(stderr,
		"Usage: %s [-s start] [-l length] [-m minimum-extent]"
		" [-v] {mountpoint}\n\t"
		"-s Starting Byte to discard from\n\t"
		"-l Number of Bytes to discard from the start\n\t"
		"-m Minimum extent length to discard\n\t"
		"-v Verbose - number of discarded bytes\n",
		program_name);
}

static void err_exit(const char *fmt, ...)
{
	va_list pvar;
	va_start(pvar, fmt);
	vfprintf(stderr, fmt, pvar);
	va_end(pvar);
	usage();
	exit(EXIT_FAILURE);
}

static void err_range(const char *optarg)
{
	err_exit("%s: %s (%s)\n", program_name, strerror(ERANGE), optarg);
}

/**
 * Get the number from argument. It can be number followed by
 * units: k|K, m|M, g|G, t|T
 */
static unsigned long long get_number(char **optarg)
{
	char *opt, *end;
	unsigned long long number, max;

	/* get the max to avoid overflow */
	max = ULLONG_MAX / 1024;
	number = 0;
	opt = *optarg;

	errno = 0;
	number = strtoul(opt, &end , 0);
	if (errno)
		err_exit("%s: %s (%s)\n", program_name,
			 strerror(errno), *optarg);

	/* determine if units are defined */
	switch (*end) {
	case 'T': /* terabytes */
	case 't':
		if (number > max)
			err_range(*optarg);
		number *= 1024;
	case 'G': /* gigabytes */
	case 'g':
		if (number > max)
			err_range(*optarg);
		number *= 1024;
	case 'M': /* megabytes */
	case 'm':
		if (number > max)
			err_range(*optarg);
		number *= 1024;
	case 'K': /* kilobytes */
	case 'k':
		if (number > max)
			err_range(*optarg);
		number *= 1024;
		end++;
	case '\0': /* end of the string */
		break;
	default:
		err_exit("%s: %s (%s)\n", program_name,
			 strerror(EINVAL), *optarg);
		return 0;
	}

	if (*end != '\0') {
		err_exit("%s: %s (%s)\n", program_name,
			 strerror(EINVAL), *optarg);
	}

	return number;
}

static int parse_opts(int argc, char **argv, struct options *opts)
{
	int c;

	while ((c = getopt(argc, argv, "s:l:m:v")) != EOF) {
		switch (c) {
		case 's': /* starting point */
			opts->range->start = get_number(&optarg);
			break;
		case 'l': /* length */
			opts->range->len = get_number(&optarg);
			break;
		case 'm': /* minlen */
			opts->range->minlen = get_number(&optarg);
			break;
		case 'v': /* verbose */
			opts->verbose = 1;
			break;
		default:
			return EXIT_FAILURE;
		}
	}

	return 0;
}

static void free_opts(struct options *opts)
{
	if (opts) {
		if (opts->range)
			free(opts->range);
		free(opts);
	}
}

static void free_opts_and_exit(struct options *opts)
{
	free_opts(opts);
	exit(EXIT_FAILURE);
}

static void print_usage_and_exit(struct options *opts)
{
	usage();
	free_opts_and_exit(opts);
}

int main(int argc, char **argv)
{
	struct options *opts;
	struct stat sb;
	int fd, ret = 0;

	opts = malloc(sizeof(struct options));
	if (!opts)
		err_exit("%s: malloc(): %s\n", program_name, strerror(errno));

	opts->range = NULL;
	opts->verbose = 0;

	if (argc > 1)
		strncpy(opts->mpoint, argv[argc - 1], sizeof(opts->mpoint));

	if (argc > 2) {
		opts->range = calloc(1, sizeof(struct fstrim_range));
		if (!opts->range) {
			fprintf(stderr, "%s: calloc(): %s\n", program_name,
				strerror(errno));
			free_opts_and_exit(opts);
		}
		opts->range->len = ULLONG_MAX;
		ret = parse_opts(argc, argv, opts);
	}

	if (ret)
		print_usage_and_exit(opts);

	if (strnlen(opts->mpoint, 1) < 1) {
		fprintf(stderr, "%s: You have to specify mount point.\n",
			program_name);
		print_usage_and_exit(opts);
	}

	if (stat(opts->mpoint, &sb) == -1) {
		fprintf(stderr, "%s: %s: %s\n", program_name,
			opts->mpoint, strerror(errno));
		print_usage_and_exit(opts);
	}

	if (!S_ISDIR(sb.st_mode)) {
		fprintf(stderr, "%s: %s: (%s)\n", program_name,
			opts->mpoint, strerror(ENOTDIR));
		print_usage_and_exit(opts);
	}

	fd = open(opts->mpoint, O_RDONLY);
	if (fd < 0) {
		fprintf(stderr, "%s: open(%s): %s\n", program_name,
			opts->mpoint, strerror(errno));
		free_opts_and_exit(opts);
	}

	if (ioctl(fd, FITRIM, opts->range)) {
		fprintf(stderr, "%s: FSTRIM: %s\n", program_name,
			strerror(errno));
		free_opts_and_exit(opts);
	}

	if ((opts->verbose) && (opts->range))
		fprintf(stdout, "%lu Bytes was trimmed\n", opts->range->len);

	free_opts(opts);
	return ret;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support
  2010-09-27 14:09 [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
                   ` (4 preceding siblings ...)
  2010-09-27 14:11 ` [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support - fstrim Lukas Czerner
@ 2010-10-11 17:02 ` Lukas Czerner
  2010-10-25 14:57   ` Ted Ts'o
  5 siblings, 1 reply; 14+ messages in thread
From: Lukas Czerner @ 2010-10-11 17:02 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, tytso, rwheeler, sandeen, adilger

On Mon, 27 Sep 2010, Lukas Czerner wrote:

> Hi,
> 
> just some minor changes here. As Andreas pointed out it is better to
> have special structure to pass FITRIM arguments into ioctl, instead
> os just an array of "noname" elements. So I have introduced this
> structure:
> 
> struct fstrim_range {
> 	uint64_t start;
> 	uint64_t len;
> 	uint64_t minlen;
> };
> 
> Also the tool which uses FSTRIM ioctl gets updated.
> 
> 
> SHORT DESCRIPTION:
> ==================
> 
> Batched discard adds ability to discard free space on mounded filesystem,
> in order to avoid using current discard implementation which discards
> recently freed blocks. This approach may on some devices (it depends on
> how efficient is the device wear-leveling algorithm) result in huge
> performance loss.
> 
> Batched discard can be invoked from user-space through FITRIM ioctl on
> the whole, or just a part, of file system. With this approach we are
> searching for continuous free blocks bigger than defined through ioctl
> to discard them. So, since we are searching for big continuous extents
> it is much more efficient than current approach and it gives user fine
> grained control over how much disk space will be reclaimed for
> wear-leveling and what impact will it have on performance.
> 
> 
> I have attached source code for example application which uses FITRIM
> to discard just a part or whole filesystem. Since FITRIM is filesystem
> independent ioctl it can be used by any filesystem which supports it.
> 
> Usage: fstrim [-s start] [-l length] [-m minimum-extent] [-v] {mountpoint}
>         -s Starting Byte to discard from
>         -l Number of Bytes to discard from the start
>         -m Minimum extent length to discard
>         -v Verbose - number of discarded bytes
> 
> ---
> bd6a5a3 ext3: Add batched discard support for ext3
> 9dcabb2 ext4: Add batched discard support for ext4
> 9c8c3a5 Add ioctl FITRIM.
> 787dbea ext4: Use return value from sb_issue_discard()
> 
>  fs/ext3/balloc.c        |  256 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ext3/super.c         |    1 +
>  fs/ext4/ext4.h          |    2 +
>  fs/ext4/mballoc.c       |  194 +++++++++++++++++++++++++++++++++++-
>  fs/ext4/super.c         |    1 +
>  fs/ioctl.c              |   39 +++++++
>  include/linux/ext3_fs.h |    1 +
>  include/linux/fs.h      |    8 ++
>  8 files changed, 501 insertions(+), 1 deletions(-)
> 
Hi Ted,

I was wondering, is there still anything holding this patch-set from
being merged in ?

Thanks!

-Lukas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support
  2010-10-11 17:02 ` [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
@ 2010-10-25 14:57   ` Ted Ts'o
  2010-10-25 16:06     ` Fstrim tool Lukas Czerner
  0 siblings, 1 reply; 14+ messages in thread
From: Ted Ts'o @ 2010-10-25 14:57 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, rwheeler, sandeen, adilger

I've added the first three patches (all but the ext3 patch) to the
ext4 patch series.  I've desk checked the code; but do you have a
handy-dandy userspace program written to trigger the ioctl?  If so,
could you send it to me so I can do some live fire testing?  Many
thanks!!

						- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Fstrim tool
  2010-10-25 14:57   ` Ted Ts'o
@ 2010-10-25 16:06     ` Lukas Czerner
  0 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-10-25 16:06 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Lukas Czerner, linux-ext4, rwheeler, sandeen, adilger

On Mon, 25 Oct 2010, Ted Ts'o wrote:

> I've added the first three patches (all but the ext3 patch) to the
> ext4 patch series.  I've desk checked the code; but do you have a
> handy-dandy userspace program written to trigger the ioctl?  If so,
> could you send it to me so I can do some live fire testing?  Many
> thanks!!
> 
> 						- Ted
> 

Hi Ted,

Thanks for merging it. I have posted userspace program for this on ext4
list earlier. But here it is, but note that I have just recently done
some changes and not fully tested it since then (but it should be ok).

Thanks again!

-Lukas

Here it is:

---

/*
 * fstrim.c -- discard the part (or whole) of mounted filesystem.
 *
 * Copyright (C) 2009 Red Hat, Inc., Lukas Czerner <lczerner@redhat.com>
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 *
 * Usage: fstrim [options] <mount point>
 */

#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <limits.h>
#include <stdarg.h>
#include <getopt.h>

#include <sys/ioctl.h>
#include <sys/stat.h>
#include <linux/fs.h>

#ifndef FITRIM
struct fstrim_range {
	uint64_t start;
	uint64_t len;
	uint64_t minlen;
};
#define FITRIM		_IOWR('X', 121, struct fstrim_range)
#endif

const char *program_name = "fstrim";

struct options {
	struct fstrim_range *range;
	char mpoint[PATH_MAX];
	char verbose;
};

static void usage(void)
{
	fprintf(stderr,
		"Usage: %s [-s start] [-l length] [-m minimum-extent]"
		" [-v] {mountpoint}\n\t"
		"-s Starting Byte to discard from\n\t"
		"-l Number of Bytes to discard from the start\n\t"
		"-m Minimum extent length to discard\n\t"
		"-v Verbose - number of discarded bytes\n",
		program_name);
}

static void err_exit(const char *fmt, ...)
{
	va_list pvar;
	va_start(pvar, fmt);
	vfprintf(stderr, fmt, pvar);
	va_end(pvar);
	usage();
	exit(EXIT_FAILURE);
}

/**
 * Get the number from argument. It can be number followed by
 * units: k|K, m|M, g|G, t|T
 */
static unsigned long long get_number(char **optarg)
{
	char *opt, *end;
	unsigned long long number, max;

	/* get the max to avoid overflow */
	max = ULLONG_MAX / 1024;
	number = 0;
	opt = *optarg;

	errno = 0;
	number = strtoul(opt, &end , 0);
	if (errno)
		err_exit("%s: %s (%s)\n", program_name,
			 strerror(errno), *optarg);

	/*
	 * Convert units to numbers. Fall-through stack is used for units
	 * so absence of breaks is intentional.
	 */
	switch (*end) {
	case 'T': /* terabytes */
	case 't':
		if (number > max)
			err_exit("%s: %s (%s)\n", program_name,
				 strerror(ERANGE), *optarg);
		number *= 1024;
	case 'G': /* gigabytes */
	case 'g':
		if (number > max)
			err_exit("%s: %s (%s)\n", program_name,
				 strerror(ERANGE), *optarg);
		number *= 1024;
	case 'M': /* megabytes */
	case 'm':
		if (number > max)
			err_exit("%s: %s (%s)\n", program_name,
				 strerror(ERANGE), *optarg);
		number *= 1024;
	case 'K': /* kilobytes */
	case 'k':
		if (number > max)
			err_exit("%s: %s (%s)\n", program_name,
				 strerror(ERANGE), *optarg);
		number *= 1024;
		end++;
	case '\0': /* end of the string */
		break;
	default:
		err_exit("%s: %s (%s)\n", program_name,
			 strerror(EINVAL), *optarg);
		return 0;
	}

	if (*end != '\0') {
		err_exit("%s: %s (%s)\n", program_name,
			 strerror(EINVAL), *optarg);
	}

	return number;
}

static int parse_opts(int argc, char **argv, struct options *opts)
{
	int c;

	while ((c = getopt(argc, argv, "s:l:m:v")) != EOF) {
		switch (c) {
		case 's': /* starting point */
			opts->range->start = get_number(&optarg);
			break;
		case 'l': /* length */
			opts->range->len = get_number(&optarg);
			break;
		case 'm': /* minlen */
			opts->range->minlen = get_number(&optarg);
			break;
		case 'v': /* verbose */
			opts->verbose = 1;
			break;
		default:
			return EXIT_FAILURE;
		}
	}

	return 0;
}

int main(int argc, char **argv)
{
	struct options *opts;
	struct stat sb;
	int fd, err = 0, ret = EXIT_FAILURE;

	opts = malloc(sizeof(struct options));
	if (!opts)
		err_exit("%s: malloc(): %s\n", program_name, strerror(errno));

	opts->range = NULL;
	opts->verbose = 0;

	if (argc > 1)
		strncpy(opts->mpoint, argv[argc - 1], sizeof(opts->mpoint));

	if (argc > 2) {
		opts->range = calloc(1, sizeof(struct fstrim_range));
		if (!opts->range) {
			fprintf(stderr, "%s: calloc(): %s\n", program_name,
				strerror(errno));
			goto free_opts;
		}
		opts->range->len = ULLONG_MAX;
		err = parse_opts(argc, argv, opts);
	}

	if (err) {
		usage();
		goto free_opts;
	}

	if (strnlen(opts->mpoint, 1) < 1) {
		fprintf(stderr, "%s: You have to specify mount point.\n",
			program_name);
		usage();
		goto free_opts;
	}

	if (stat(opts->mpoint, &sb) == -1) {
		fprintf(stderr, "%s: %s: %s\n", program_name,
			opts->mpoint, strerror(errno));
		usage();
		goto free_opts;
	}

	if (!S_ISDIR(sb.st_mode)) {
		fprintf(stderr, "%s: %s: (%s)\n", program_name,
			opts->mpoint, strerror(ENOTDIR));
		usage();
		goto free_opts;
	}

	fd = open(opts->mpoint, O_RDONLY);
	if (fd < 0) {
		fprintf(stderr, "%s: open(%s): %s\n", program_name,
			opts->mpoint, strerror(errno));
		goto free_opts;
	}

	if (ioctl(fd, FITRIM, opts->range)) {
		fprintf(stderr, "%s: FSTRIM: %s\n", program_name,
			strerror(errno));
		goto free_opts;
	}

	if ((opts->verbose) && (opts->range))
		fprintf(stdout, "%lu Bytes was trimmed\n", opts->range->len);

	ret = EXIT_SUCCESS;

free_opts:
	if (opts) {
		if (opts->range)
			free(opts->range);
		free(opts);
	}

	return ret;
}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] ext4: Add batched discard support for ext4
  2010-09-27 14:09 ` [PATCH 3/4] ext4: Add batched discard support for ext4 Lukas Czerner
@ 2010-10-25 18:50   ` Ted Ts'o
  2010-10-25 19:08     ` Ted Ts'o
  2010-10-26 12:43     ` Lukas Czerner
  0 siblings, 2 replies; 14+ messages in thread
From: Ted Ts'o @ 2010-10-25 18:50 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, rwheeler, sandeen, adilger, Dmitry Monakhov

On Mon, Sep 27, 2010 at 04:09:59PM +0200, Lukas Czerner wrote:
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 93eb6c2..80a5139 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
      ....
> +	struct ext4_buddy e4b;
> +	ext4_fsblk_t first_group, last_group;

This should be ext4_group_t, shouldn't it?  

> +	/* Determine first and last group to examine based on start and len */
> +	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
> +		      EXT4_BLOCKS_PER_GROUP(sb);
> +	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
> +		     EXT4_BLOCKS_PER_GROUP(sb);

I've tried compiling this for 32-bit x86, and this blows up because
you can't divide long long's in the kernel.  (This is what do_div is
for, and it's why ext4_get_group_no_and_offset() exists.)

> +	first_block = (start - le32_to_cpu(es->s_first_data_block)) %
> +			EXT4_BLOCKS_PER_GROUP(sb);
> +	last_block = EXT4_BLOCKS_PER_GROUP(sb);

This means that the ext4 FITRIM ioctl will trim to the end of the
blockgroup, and not just to the last block specified by the user.  Is
this intentional?

Also, it looks like there's nothing to check for the last blockgroup,
where the last block might be less than a grpblk_t offset of
EXT4_BLOCKS_PER_GROUP()?

						- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] ext4: Add batched discard support for ext4
  2010-10-25 18:50   ` Ted Ts'o
@ 2010-10-25 19:08     ` Ted Ts'o
  2010-10-26 12:43     ` Lukas Czerner
  1 sibling, 0 replies; 14+ messages in thread
From: Ted Ts'o @ 2010-10-25 19:08 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, rwheeler, sandeen, adilger, Dmitry Monakhov

The following diff (against this patch) allows the ext4 tree to build
on 32-bit x86.  I'm still not entirely convinced the right thing
happens when start+len goes beyond the end of the file system....

	     	       	    	   - Ted

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 8cbef43..e3bcc06 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4761,7 +4761,6 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 	ext4_lock_group(sb, group);
 
 	while (start < max) {
-
 		start = mb_find_next_zero_bit(bitmap, max, start);
 		if (start >= max)
 			break;
@@ -4816,10 +4815,9 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 {
 	struct ext4_buddy e4b;
-	ext4_fsblk_t first_group, last_group;
+	ext4_group_t first_group, last_group;
 	ext4_group_t group, ngroups = ext4_get_groups_count(sb);
 	ext4_grpblk_t cnt = 0, first_block, last_block;
-	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
 	uint64_t start, len, minlen, trimmed;
 	int ret = 0;
 
@@ -4832,21 +4830,17 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 		return -EINVAL;
 
 	/* Determine first and last group to examine based on start and len */
-	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
-		      EXT4_BLOCKS_PER_GROUP(sb);
-	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
-		     EXT4_BLOCKS_PER_GROUP(sb);
+	ext4_get_group_no_and_offset(sb, (ext4_fsblk_t) start,
+				     &first_group, &first_block);
+	ext4_get_group_no_and_offset(sb, (ext4_fsblk_t) (start + len),
+				     &last_group, &last_block);
 	last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
+	last_block = EXT4_BLOCKS_PER_GROUP(sb);
 
 	if (first_group > last_group)
 		return -EINVAL;
 
-	first_block = (start - le32_to_cpu(es->s_first_data_block)) %
-			EXT4_BLOCKS_PER_GROUP(sb);
-	last_block = EXT4_BLOCKS_PER_GROUP(sb);
-
 	for (group = first_group; group <= last_group; group++) {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] ext4: Add batched discard support for ext4
  2010-10-25 18:50   ` Ted Ts'o
  2010-10-25 19:08     ` Ted Ts'o
@ 2010-10-26 12:43     ` Lukas Czerner
  2010-10-26 14:20       ` Ted Ts'o
  1 sibling, 1 reply; 14+ messages in thread
From: Lukas Czerner @ 2010-10-26 12:43 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukas Czerner, linux-ext4, rwheeler, sandeen, adilger,
	Dmitry Monakhov

On Mon, 25 Oct 2010, Ted Ts'o wrote:

> On Mon, Sep 27, 2010 at 04:09:59PM +0200, Lukas Czerner wrote:
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 93eb6c2..80a5139 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
>       ....
> > +	struct ext4_buddy e4b;
> > +	ext4_fsblk_t first_group, last_group;
> 
> This should be ext4_group_t, shouldn't it?  

Right, it should be. Sorry about that.

> 
> > +	/* Determine first and last group to examine based on start and len */
> > +	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
> > +		      EXT4_BLOCKS_PER_GROUP(sb);
> > +	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
> > +		     EXT4_BLOCKS_PER_GROUP(sb);
> 
> I've tried compiling this for 32-bit x86, and this blows up because
> you can't divide long long's in the kernel.  (This is what do_div is
> for, and it's why ext4_get_group_no_and_offset() exists.)

I am ashamed, I probably should test patches on different architectures.
Thanks.

> 
> > +	first_block = (start - le32_to_cpu(es->s_first_data_block)) %
> > +			EXT4_BLOCKS_PER_GROUP(sb);
> > +	last_block = EXT4_BLOCKS_PER_GROUP(sb);
> 
> This means that the ext4 FITRIM ioctl will trim to the end of the
> blockgroup, and not just to the last block specified by the user.  Is
> this intentional?

If the group is NOT last group, or (start+len) is aligned to the
EXT4_BLOCK_PER_GROUP() boundary we will trim all blocks in this
particular block group. Otherwise we will know how much we need to trim
in this group to satisfy user request

	if (len >= EXT4_BLOCKS_PER_GROUP(sb))
		len -= (EXT4_BLOCKS_PER_GROUP(sb) - first_block);
	else
		last_block = len;

because we do keep track of how many block we need to trim by
decreasing len.

> 
> Also, it looks like there's nothing to check for the last blockgroup,
> where the last block might be less than a grpblk_t offset of
> EXT4_BLOCKS_PER_GROUP()?

This is not a problem, because when traversing the bitmap we will hit
the end of the group anyway, because those blocks (out of filesystem) are
marked as used in the bitmap and hence:

	start = mb_find_next_zero_bit(bitmap, max, start);
		if (start >= max)
			break;

will end the traversing without any attempt to trim blocks out of
filesystem boundary.

> 
> 						- Ted
> 

-Lukas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] ext4: Add batched discard support for ext4
  2010-10-26 12:43     ` Lukas Czerner
@ 2010-10-26 14:20       ` Ted Ts'o
  2010-10-26 14:36         ` Lukas Czerner
  0 siblings, 1 reply; 14+ messages in thread
From: Ted Ts'o @ 2010-10-26 14:20 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, rwheeler, sandeen, adilger, Dmitry Monakhov

OK, thanks.  Does the interdiff (or the updated
add-batched-discard-support-for-ext4 as found in the ext4 patch queue)
look good to you?

     	     				- Ted


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] ext4: Add batched discard support for ext4
  2010-10-26 14:20       ` Ted Ts'o
@ 2010-10-26 14:36         ` Lukas Czerner
  0 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-10-26 14:36 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukas Czerner, linux-ext4, rwheeler, sandeen, adilger,
	Dmitry Monakhov

On Tue, 26 Oct 2010, Ted Ts'o wrote:

> OK, thanks.  Does the interdiff (or the updated
> add-batched-discard-support-for-ext4 as found in the ext4 patch queue)
> look good to you?
> 
>      	     				- Ted
> 

It looks good to me. Thanks!

-Lukas

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-10-26 14:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-27 14:09 [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
2010-09-27 14:09 ` [PATCH 1/4] ext4: Use return value from sb_issue_discard() Lukas Czerner
2010-09-27 14:09 ` [PATCH 2/4] Add ioctl FITRIM Lukas Czerner
2010-09-27 14:09 ` [PATCH 3/4] ext4: Add batched discard support for ext4 Lukas Czerner
2010-10-25 18:50   ` Ted Ts'o
2010-10-25 19:08     ` Ted Ts'o
2010-10-26 12:43     ` Lukas Czerner
2010-10-26 14:20       ` Ted Ts'o
2010-10-26 14:36         ` Lukas Czerner
2010-09-27 14:10 ` [PATCH 4/4] ext3: Add batched discard support for ext3 Lukas Czerner
2010-09-27 14:11 ` [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support - fstrim Lukas Czerner
2010-10-11 17:02 ` [PATCH 0/4 v. 9] Ext3/Ext4 Batched discard support Lukas Czerner
2010-10-25 14:57   ` Ted Ts'o
2010-10-25 16:06     ` Fstrim tool Lukas Czerner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).