* [PATCH 1/2] ext3: Add batched discard support for ext3
@ 2010-11-22 11:29 Lukas Czerner
2010-11-22 11:29 ` [PATCH 2/2] ext3: Add FITRIM handle " Lukas Czerner
0 siblings, 1 reply; 17+ messages in thread
From: Lukas Czerner @ 2010-11-22 11:29 UTC (permalink / raw)
To: jack; +Cc: lczerner, tytso, linux-fsdevel, linux-ext4
Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).
It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks are
marked as used and then trimmed. Afterwards these blocks are marked as
free in per-group bitmap.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
fs/ext3/balloc.c | 262 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/ext3_fs.h | 1 +
2 files changed, 263 insertions(+), 0 deletions(-)
diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
index b3db226..8393abf 100644
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
@@ -20,6 +20,7 @@
#include <linux/ext3_jbd.h>
#include <linux/quotaops.h>
#include <linux/buffer_head.h>
+#include <linux/blkdev.h>
/*
* balloc.c contains the blocks allocation and deallocation routines
@@ -39,6 +40,23 @@
#define in_range(b, first, len) ((b) >= (first) && (b) <= (first) + (len) - 1)
+/*
+ * Calculate the block group number and offset, given a block number
+ */
+void ext3_get_group_no_and_offset(struct super_block *sb, ext3_fsblk_t blocknr,
+ unsigned long *blockgrpp, ext3_grpblk_t *offsetp)
+{
+ struct ext3_super_block *es = EXT3_SB(sb)->s_es;
+ ext3_grpblk_t offset;
+
+ blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
+ offset = do_div(blocknr, EXT3_BLOCKS_PER_GROUP(sb));
+ if (offsetp)
+ *offsetp = offset;
+ if (blockgrpp)
+ *blockgrpp = blocknr;
+}
+
/**
* ext3_get_group_desc() -- load group descriptor from disk
* @sb: super block
@@ -1885,3 +1903,247 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
return ext3_bg_num_gdb_meta(sb,group);
}
+
+/**
+ * ext3_trim_all_free -- function to trim all free space in alloc. group
+ * @sb: super block for file system
+ * @group: allocation group to trim
+ * @start: first group block to examine
+ * @max: last group block to examine
+ * @gdp: allocation group description structure
+ * @minblocks: minimum extent block count
+ *
+ * ext3_trim_all_free walks through group's block bitmap searching for free
+ * blocks. When the free block is found, it tries to allocate this block and
+ * consequent free block to get the biggest free extent possible, until it
+ * reaches any used block. Then issue a TRIM command on this extent and free
+ * the extent in the block bitmap. This is done until whole group is scanned.
+ */
+ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
+ ext3_grpblk_t start, ext3_grpblk_t max,
+ ext3_grpblk_t minblocks)
+{
+ handle_t *handle;
+ ext3_grpblk_t next, free_blocks, bit, freed, count = 0;
+ ext3_fsblk_t discard_block;
+ struct ext3_sb_info *sbi;
+ struct buffer_head *gdp_bh, *bitmap_bh = NULL;
+ struct ext3_group_desc *gdp;
+ int err = 0, ret = 0;
+
+ /*
+ * We will update one block bitmap, and one group descriptor
+ */
+ handle = ext3_journal_start_sb(sb, 2);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ return err;
+ }
+
+ bitmap_bh = read_block_bitmap(sb, group);
+ if (!bitmap_bh)
+ goto err_out;
+
+ BUFFER_TRACE(bitmap_bh, "getting undo access");
+ err = ext3_journal_get_undo_access(handle, bitmap_bh);
+ if (err)
+ goto err_out;
+
+ gdp = ext3_get_group_desc(sb, group, &gdp_bh);
+ if (!gdp)
+ goto err_out;
+
+ BUFFER_TRACE(gdp_bh, "get_write_access");
+ err = ext3_journal_get_write_access(handle, gdp_bh);
+ if (err)
+ goto err_out;
+
+ free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+ sbi = EXT3_SB(sb);
+
+ /* Walk through the whole group */
+ while (start < max) {
+
+ start = bitmap_search_next_usable_block(start, bitmap_bh, max);
+ if (start < 0)
+ break;
+ next = start;
+
+ /*
+ * Allocate contiguous free extents by setting bits in the
+ * block bitmap
+ */
+ while (next < max
+ && claim_block(sb_bgl_lock(sbi, group),
+ next, bitmap_bh)) {
+ next++;
+ }
+
+ /* We did not claim any blocks */
+ if (next == start)
+ continue;
+
+ discard_block = (ext3_fsblk_t)start +
+ ext3_group_first_block_no(sb, group);
+
+ /* Update counters */
+ spin_lock(sb_bgl_lock(sbi, group));
+ le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
+ spin_unlock(sb_bgl_lock(sbi, group));
+ percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
+
+ /* Do not issue a TRIM on extents smaller than minblocks */
+ if ((next - start) < minblocks)
+ goto free_extent;
+
+ /* Send the TRIM command down to the device */
+ ret = sb_issue_discard(sb, discard_block, next - start,
+ GFP_NOFS, 0);
+ count += (next - start);
+free_extent:
+ freed = 0;
+
+ /*
+ * Clear bits in the bitmap
+ */
+ for (bit = start; bit < next; bit++) {
+ BUFFER_TRACE(bitmap_bh, "clear bit");
+ if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
+ bit, bitmap_bh->b_data)) {
+ ext3_error(sb, __func__,
+ "bit already cleared for block "E3FSBLK,
+ (unsigned long)bit);
+ BUFFER_TRACE(bitmap_bh, "bit already cleared");
+ } else {
+ freed++;
+ }
+ }
+
+ /* Update couters */
+ spin_lock(sb_bgl_lock(sbi, group));
+ le16_add_cpu(&gdp->bg_free_blocks_count, freed);
+ spin_unlock(sb_bgl_lock(sbi, group));
+ percpu_counter_add(&sbi->s_freeblocks_counter, next - start);
+
+ start = next;
+ if (ret < 0) {
+ if (ret == -EOPNOTSUPP) {
+ ext3_warning(sb, __func__,
+ "discard not supported!");
+ count = ret;
+ break;
+ }
+ err = ret;
+ break;
+ }
+
+ if (fatal_signal_pending(current)) {
+ count = -ERESTARTSYS;
+ break;
+ }
+
+ cond_resched();
+
+ /* No more suitable extents */
+ if ((free_blocks - count) < minblocks)
+ break;
+ }
+
+ /* We dirtied the bitmap block */
+ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+ err = ext3_journal_dirty_metadata(handle, bitmap_bh);
+
+ /* And the group descriptor block */
+ BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
+ ret = ext3_journal_dirty_metadata(handle, gdp_bh);
+ if (!err)
+ err = ret;
+
+ ext3_debug("trimmed %d blocks in the group %d\n",
+ count, group);
+
+err_out:
+ if (err) {
+ ext3_std_error(sb, err);
+ count = err;
+ }
+ ext3_journal_stop(handle);
+ brelse(bitmap_bh);
+
+ return count;
+}
+
+/**
+ * ext3_trim_fs() -- trim ioctl handle function
+ * @sb: superblock for filesystem
+ * @start: First Byte to trim
+ * @len: number of Bytes to trim from start
+ * @minlen: minimum extent length in Bytes
+ *
+ * ext3_trim_fs goes through all allocation groups containing Bytes from
+ * start to start+len. For each such a group ext3_trim_all_free function
+ * is invoked to trim all free space.
+ */
+int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range)
+{
+ ext3_grpblk_t last_block, first_block, free_blocks;
+ unsigned long first_group, last_group;
+ unsigned long group, ngroups;
+ struct ext3_group_desc *gdp;
+ struct ext3_super_block *es;
+ uint64_t start, len, minlen, trimmed;
+ int ret = 0;
+
+ start = range->start >> sb->s_blocksize_bits;
+ len = range->len >> sb->s_blocksize_bits;
+ minlen = range->minlen >> sb->s_blocksize_bits;
+ trimmed = 0;
+
+ if (unlikely(minlen > EXT3_BLOCKS_PER_GROUP(sb)))
+ return -EINVAL;
+
+ es = EXT3_SB(sb)->s_es;
+ ngroups = EXT3_SB(sb)->s_groups_count;
+ smp_rmb();
+
+ /* Determine first and last group to examine based on start and len */
+ ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) start,
+ &first_group, &first_block);
+ ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) (start + len),
+ &last_group, &last_block);
+ last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
+ last_block = EXT3_BLOCKS_PER_GROUP(sb);
+
+ if (first_group > last_group)
+ return -EINVAL;
+
+ for (group = first_group; group <= last_group; group++) {
+ gdp = ext3_get_group_desc(sb, group, NULL);
+ if (!gdp)
+ break;
+
+ free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+ if (free_blocks < minlen)
+ continue;
+
+ if (len >= EXT3_BLOCKS_PER_GROUP(sb))
+ len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
+ else
+ last_block = len;
+
+ ret = ext3_trim_all_free(sb, group, first_block,
+ last_block, minlen);
+ if (ret < 0)
+ break;
+
+ trimmed += ret;
+ first_block = 0;
+ }
+
+ if (ret >= 0)
+ ret = 0;
+
+ range->len = trimmed * sb->s_blocksize;
+
+ return ret;
+}
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 6ce1bca..a443965 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -856,6 +856,7 @@ extern struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
extern int ext3_should_retry_alloc(struct super_block *sb, int *retries);
extern void ext3_init_block_alloc_info(struct inode *);
extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_window_node *rsv);
+extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
/* dir.c */
extern int ext3_check_dir_entry(const char *, struct inode *,
--
1.7.2.3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/2] ext3: Add FITRIM handle for ext3
2010-11-22 11:29 [PATCH 1/2] ext3: Add batched discard support for ext3 Lukas Czerner
@ 2010-11-22 11:29 ` Lukas Czerner
2010-11-22 16:13 ` Greg Freemyer
2010-11-22 17:42 ` Jan Kara
0 siblings, 2 replies; 17+ messages in thread
From: Lukas Czerner @ 2010-11-22 11:29 UTC (permalink / raw)
To: jack; +Cc: lczerner, tytso, linux-fsdevel, linux-ext4
It takes fstrim_range structure as an argument. fstrim_range is definec in
the include/linux/fs.h.
After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
fs/ext3/ioctl.c | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index 8897481..fc080dd 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -276,7 +276,29 @@ group_add_out:
mnt_drop_write(filp->f_path.mnt);
return err;
}
+ case FITRIM: {
+ struct super_block *sb = inode->i_sb;
+ struct fstrim_range range;
+ int ret = 0;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (copy_from_user(&range, (struct fstrim_range *)arg,
+ sizeof(range)))
+ return -EFAULT;
+
+ ret = ext3_trim_fs(sb, &range);
+ if (ret < 0)
+ return ret;
+
+ if (copy_to_user((struct fstrim_range *)arg, &range,
+ sizeof(range)))
+ return -EFAULT;
+
+ return 0;
+ }
default:
return -ENOTTY;
--
1.7.2.3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] ext3: Add FITRIM handle for ext3
2010-11-22 11:29 ` [PATCH 2/2] ext3: Add FITRIM handle " Lukas Czerner
@ 2010-11-22 16:13 ` Greg Freemyer
2010-11-23 11:06 ` Lukas Czerner
2010-11-22 17:42 ` Jan Kara
1 sibling, 1 reply; 17+ messages in thread
From: Greg Freemyer @ 2010-11-22 16:13 UTC (permalink / raw)
To: Lukas Czerner; +Cc: jack, tytso, linux-fsdevel, linux-ext4
On Mon, Nov 22, 2010 at 6:29 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> It takes fstrim_range structure as an argument. fstrim_range is definec in
> the include/linux/fs.h.
>
> After the FITRIM is done, the number of actually discarded Bytes is stored
> in fstrim_range.len to give the user better insight on how much storage
> space has been really released for wear-leveling.
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
That's a misleading description in my opinion, from my understanding
this is more accurate:
===
After the FITRIM is done, the number of bytes passed from the
filesystem down the block stack to the device for potential discard is
stored in fstrim_range.len. This number is a maximum discard amount
from the storage device's perspective, because FITRIM called repeated
will keep sending the same sectors for discard repeatedly.
fstrim_range.len will report the same potential discard bytes each
time, but only sectors which had been written to between the discards
would actually be discarded by the storage device. Further, the
kernel block layer reserves the right to adjust the discard ranges to
fit raid stripe geometry, non-trim capable devices in a LVM setup,
etc. These reductions would not be reflected in fstrim_range.len.
As 2.6.37, the kernel block layer does not fully support discard and
as such will simply ignore all discard requests sent to volumes
created by device mapper or mdraid. This is done in a silent way, so
these failures to discard are also not reflected in fstrim_range.len.
Thus fstrim_range.len can give the user better insight on how much
storage space has potentially been released for wear-leveling, but it
needs to be one of only one criteria the userspace tools take into
account when trying to optimize calls to FITRIM.
===
Obviously, I'd like to also see that also in API documentation for
FITRIM. (And correct me if I'm wrong about device mapper / mdraid.
I'd love to be wrong about that statement..)
Greg
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] ext3: Add FITRIM handle for ext3
2010-11-22 16:13 ` Greg Freemyer
@ 2010-11-23 11:06 ` Lukas Czerner
0 siblings, 0 replies; 17+ messages in thread
From: Lukas Czerner @ 2010-11-23 11:06 UTC (permalink / raw)
To: Greg Freemyer
Cc: Lukas Czerner, jack, tytso, linux-fsdevel, linux-ext4,
Mike Snitzer
On Mon, 22 Nov 2010, Greg Freemyer wrote:
> On Mon, Nov 22, 2010 at 6:29 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> > It takes fstrim_range structure as an argument. fstrim_range is definec in
> > the include/linux/fs.h.
> >
> > After the FITRIM is done, the number of actually discarded Bytes is stored
> > in fstrim_range.len to give the user better insight on how much storage
> > space has been really released for wear-leveling.
> >
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
>
> That's a misleading description in my opinion, from my understanding
> this is more accurate:
>
> ===
> After the FITRIM is done, the number of bytes passed from the
> filesystem down the block stack to the device for potential discard is
> stored in fstrim_range.len. This number is a maximum discard amount
> from the storage device's perspective, because FITRIM called repeated
> will keep sending the same sectors for discard repeatedly.
> fstrim_range.len will report the same potential discard bytes each
> time, but only sectors which had been written to between the discards
> would actually be discarded by the storage device. Further, the
> kernel block layer reserves the right to adjust the discard ranges to
> fit raid stripe geometry, non-trim capable devices in a LVM setup,
> etc. These reductions would not be reflected in fstrim_range.len.
>
> As 2.6.37, the kernel block layer does not fully support discard and
> as such will simply ignore all discard requests sent to volumes
> created by device mapper or mdraid. This is done in a silent way, so
> these failures to discard are also not reflected in fstrim_range.len.
I think this is not entirely true, because we have discard support for
dm linear and stripe:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5ae89a8720c28caf35c4e53711d77df2856c404e
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7b76ec11fec40203836b488496d2df082d5b2022
(Adding Mike Snitzer into cc)
>
> Thus fstrim_range.len can give the user better insight on how much
> storage space has potentially been released for wear-leveling, but it
> needs to be one of only one criteria the userspace tools take into
> account when trying to optimize calls to FITRIM.
> ===
>
> Obviously, I'd like to also see that also in API documentation for
> FITRIM. (And correct me if I'm wrong about device mapper / mdraid.
> I'd love to be wrong about that statement..)
>
> Greg
>
Greg, thank you for this, aside the dm thing, this is definitely a lot
better explanation and it would be nice to include this into commit
message (it is too late for ext4:).
Jan, do you want me to repost this with new commit message or you can
add it yourself ?
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] ext3: Add FITRIM handle for ext3
2010-11-22 11:29 ` [PATCH 2/2] ext3: Add FITRIM handle " Lukas Czerner
2010-11-22 16:13 ` Greg Freemyer
@ 2010-11-22 17:42 ` Jan Kara
2010-11-23 10:32 ` Lukas Czerner
1 sibling, 1 reply; 17+ messages in thread
From: Jan Kara @ 2010-11-22 17:42 UTC (permalink / raw)
To: Lukas Czerner; +Cc: jack, tytso, linux-fsdevel, linux-ext4
On Mon 22-11-10 12:29:18, Lukas Czerner wrote:
> It takes fstrim_range structure as an argument. fstrim_range is definec in
> the include/linux/fs.h.
>
> After the FITRIM is done, the number of actually discarded Bytes is stored
> in fstrim_range.len to give the user better insight on how much storage
> space has been really released for wear-leveling.
Umm, why do we have to do this when FITRIM is already handled in
fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no?
Honza
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
> fs/ext3/ioctl.c | 22 ++++++++++++++++++++++
> 1 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
> index 8897481..fc080dd 100644
> --- a/fs/ext3/ioctl.c
> +++ b/fs/ext3/ioctl.c
> @@ -276,7 +276,29 @@ group_add_out:
> mnt_drop_write(filp->f_path.mnt);
> return err;
> }
> + case FITRIM: {
>
> + struct super_block *sb = inode->i_sb;
> + struct fstrim_range range;
> + int ret = 0;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (copy_from_user(&range, (struct fstrim_range *)arg,
> + sizeof(range)))
> + return -EFAULT;
> +
> + ret = ext3_trim_fs(sb, &range);
> + if (ret < 0)
> + return ret;
> +
> + if (copy_to_user((struct fstrim_range *)arg, &range,
> + sizeof(range)))
> + return -EFAULT;
> +
> + return 0;
> + }
>
> default:
> return -ENOTTY;
> --
> 1.7.2.3
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] ext3: Add FITRIM handle for ext3
2010-11-22 17:42 ` Jan Kara
@ 2010-11-23 10:32 ` Lukas Czerner
2010-11-24 13:01 ` Jan Kara
0 siblings, 1 reply; 17+ messages in thread
From: Lukas Czerner @ 2010-11-23 10:32 UTC (permalink / raw)
To: Jan Kara; +Cc: Lukas Czerner, tytso, linux-fsdevel, linux-ext4
On Mon, 22 Nov 2010, Jan Kara wrote:
> On Mon 22-11-10 12:29:18, Lukas Czerner wrote:
> > It takes fstrim_range structure as an argument. fstrim_range is definec in
> > the include/linux/fs.h.
> >
> > After the FITRIM is done, the number of actually discarded Bytes is stored
> > in fstrim_range.len to give the user better insight on how much storage
> > space has been really released for wear-leveling.
> Umm, why do we have to do this when FITRIM is already handled in
> fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no?
Hi,
see upstream commits:
93bb41f4f8b89ac8b4d0a734bc59634cb0a29a89
e681c047e47c0abe67bf95857f23814372793cb0
unfortunately people was not happy with generic ioctl interface for that
purpose, there were concerned that it is no common enough to be included
in core vfs and there is no need for new super operation since each
filesystem can easily setup its own ioctl handling.
When I say people I need to clarify that it was mainly Christoph Hellwig
who had objections against the former implementation and I must say that
he had a point.
-Lukas
>
> Honza
> >
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > ---
> > fs/ext3/ioctl.c | 22 ++++++++++++++++++++++
> > 1 files changed, 22 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
> > index 8897481..fc080dd 100644
> > --- a/fs/ext3/ioctl.c
> > +++ b/fs/ext3/ioctl.c
> > @@ -276,7 +276,29 @@ group_add_out:
> > mnt_drop_write(filp->f_path.mnt);
> > return err;
> > }
> > + case FITRIM: {
> >
> > + struct super_block *sb = inode->i_sb;
> > + struct fstrim_range range;
> > + int ret = 0;
> > +
> > + if (!capable(CAP_SYS_ADMIN))
> > + return -EPERM;
> > +
> > + if (copy_from_user(&range, (struct fstrim_range *)arg,
> > + sizeof(range)))
> > + return -EFAULT;
> > +
> > + ret = ext3_trim_fs(sb, &range);
> > + if (ret < 0)
> > + return ret;
> > +
> > + if (copy_to_user((struct fstrim_range *)arg, &range,
> > + sizeof(range)))
> > + return -EFAULT;
> > +
> > + return 0;
> > + }
> >
> > default:
> > return -ENOTTY;
> > --
> > 1.7.2.3
> >
>
--
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] ext3: Add FITRIM handle for ext3
2010-11-23 10:32 ` Lukas Czerner
@ 2010-11-24 13:01 ` Jan Kara
2010-11-24 14:32 ` Lukas Czerner
0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2010-11-24 13:01 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Jan Kara, tytso, linux-fsdevel, linux-ext4
[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]
On Tue 23-11-10 11:32:46, Lukas Czerner wrote:
> On Mon, 22 Nov 2010, Jan Kara wrote:
>
> > On Mon 22-11-10 12:29:18, Lukas Czerner wrote:
> > > It takes fstrim_range structure as an argument. fstrim_range is definec in
> > > the include/linux/fs.h.
> > >
> > > After the FITRIM is done, the number of actually discarded Bytes is stored
> > > in fstrim_range.len to give the user better insight on how much storage
> > > space has been really released for wear-leveling.
> > Umm, why do we have to do this when FITRIM is already handled in
> > fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no?
>
> Hi,
>
> see upstream commits:
> 93bb41f4f8b89ac8b4d0a734bc59634cb0a29a89
> e681c047e47c0abe67bf95857f23814372793cb0
>
> unfortunately people was not happy with generic ioctl interface for that
> purpose, there were concerned that it is no common enough to be included
> in core vfs and there is no need for new super operation since each
> filesystem can easily setup its own ioctl handling.
>
> When I say people I need to clarify that it was mainly Christoph Hellwig
> who had objections against the former implementation and I must say that
> he had a point.
OK, I see. I had a fresh look at the patches and I've found a few
suboptimal things which I've fixed. Most notably I don't think we have to
issue a warning when underlying device does not support FITRIM. Returning
EOPNOTSUPP should be enough. And I also think that remounting the
filesystem read-only or even panicking (the result of calling
ext3_std_error()) is necessary when sb_issue_discard() fails for whatever
reason. Sure it is suspicious but we can cope just fine with that. BTW, I
think ext4 might want this as well. Finally, we were missing to set the
error in a few cases (e.g. when bitmap could not be loaded).
Attached is a diff with what I did and I'll repost what I currently have in
my tree.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
[-- Attachment #2: ext3-discard-update --]
[-- Type: text/plain, Size: 2564 bytes --]
diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
index 8393abf..878cbca 100644
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
@@ -1935,14 +1935,14 @@ ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
* We will update one block bitmap, and one group descriptor
*/
handle = ext3_journal_start_sb(sb, 2);
- if (IS_ERR(handle)) {
- err = PTR_ERR(handle);
- return err;
- }
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
bitmap_bh = read_block_bitmap(sb, group);
- if (!bitmap_bh)
+ if (!bitmap_bh) {
+ err = -EIO;
goto err_out;
+ }
BUFFER_TRACE(bitmap_bh, "getting undo access");
err = ext3_journal_get_undo_access(handle, bitmap_bh);
@@ -1950,8 +1950,10 @@ ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
goto err_out;
gdp = ext3_get_group_desc(sb, group, &gdp_bh);
- if (!gdp)
+ if (!gdp) {
+ err = -EIO;
goto err_out;
+ }
BUFFER_TRACE(gdp_bh, "get_write_access");
err = ext3_journal_get_write_access(handle, gdp_bh);
@@ -1963,7 +1965,6 @@ ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
/* Walk through the whole group */
while (start < max) {
-
start = bitmap_search_next_usable_block(start, bitmap_bh, max);
if (start < 0)
break;
@@ -1997,7 +1998,7 @@ ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
goto free_extent;
/* Send the TRIM command down to the device */
- ret = sb_issue_discard(sb, discard_block, next - start,
+ err = sb_issue_discard(sb, discard_block, next - start,
GFP_NOFS, 0);
count += (next - start);
free_extent:
@@ -2023,22 +2024,18 @@ free_extent:
spin_lock(sb_bgl_lock(sbi, group));
le16_add_cpu(&gdp->bg_free_blocks_count, freed);
spin_unlock(sb_bgl_lock(sbi, group));
- percpu_counter_add(&sbi->s_freeblocks_counter, next - start);
+ percpu_counter_add(&sbi->s_freeblocks_counter, freed);
start = next;
- if (ret < 0) {
- if (ret == -EOPNOTSUPP) {
- ext3_warning(sb, __func__,
- "discard not supported!");
- count = ret;
- break;
- }
- err = ret;
+ if (err < 0) {
+ if (err != -EOPNOTSUPP)
+ ext3_warning(sb, __func__, "Discard command "
+ "returned error %d\n", err);
break;
}
if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
+ err = -ERESTARTSYS;
break;
}
@@ -2063,12 +2060,10 @@ free_extent:
count, group);
err_out:
- if (err) {
- ext3_std_error(sb, err);
+ if (err)
count = err;
- }
ext3_journal_stop(handle);
brelse(bitmap_bh);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] ext3: Add FITRIM handle for ext3
2010-11-24 13:01 ` Jan Kara
@ 2010-11-24 14:32 ` Lukas Czerner
2010-11-24 16:32 ` Jan Kara
0 siblings, 1 reply; 17+ messages in thread
From: Lukas Czerner @ 2010-11-24 14:32 UTC (permalink / raw)
To: Jan Kara; +Cc: Lukas Czerner, tytso, linux-fsdevel, linux-ext4
On Wed, 24 Nov 2010, Jan Kara wrote:
> On Tue 23-11-10 11:32:46, Lukas Czerner wrote:
> > On Mon, 22 Nov 2010, Jan Kara wrote:
> >
> > > On Mon 22-11-10 12:29:18, Lukas Czerner wrote:
> > > > It takes fstrim_range structure as an argument. fstrim_range is definec in
> > > > the include/linux/fs.h.
> > > >
> > > > After the FITRIM is done, the number of actually discarded Bytes is stored
> > > > in fstrim_range.len to give the user better insight on how much storage
> > > > space has been really released for wear-leveling.
> > > Umm, why do we have to do this when FITRIM is already handled in
> > > fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no?
> >
> > Hi,
> >
> > see upstream commits:
> > 93bb41f4f8b89ac8b4d0a734bc59634cb0a29a89
> > e681c047e47c0abe67bf95857f23814372793cb0
> >
> > unfortunately people was not happy with generic ioctl interface for that
> > purpose, there were concerned that it is no common enough to be included
> > in core vfs and there is no need for new super operation since each
> > filesystem can easily setup its own ioctl handling.
> >
> > When I say people I need to clarify that it was mainly Christoph Hellwig
> > who had objections against the former implementation and I must say that
> > he had a point.
> OK, I see. I had a fresh look at the patches and I've found a few
> suboptimal things which I've fixed. Most notably I don't think we have to
> issue a warning when underlying device does not support FITRIM. Returning
> EOPNOTSUPP should be enough. And I also think that remounting the
> filesystem read-only or even panicking (the result of calling
> ext3_std_error()) is necessary when sb_issue_discard() fails for whatever
Did you meant "is NOT necessary" ? Just for clarification.
> reason. Sure it is suspicious but we can cope just fine with that. BTW, I
> think ext4 might want this as well. Finally, we were missing to set the
> error in a few cases (e.g. when bitmap could not be loaded).
>
> Attached is a diff with what I did and I'll repost what I currently have in
> my tree.
>
> Honza
>
Hi,
thanks a lot for your review and fixes. I already posted the patch for
ext4 (remove warning etc.) yesterday. So maybe we should get rid of the
warning message completely (ext3_warning) and just let userspace handle
the error and print out warning ? Otherwise diff looks good.
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] ext3: Add FITRIM handle for ext3
2010-11-24 14:32 ` Lukas Czerner
@ 2010-11-24 16:32 ` Jan Kara
0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2010-11-24 16:32 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Jan Kara, tytso, linux-fsdevel, linux-ext4
On Wed 24-11-10 15:32:59, Lukas Czerner wrote:
> On Wed, 24 Nov 2010, Jan Kara wrote:
> > On Tue 23-11-10 11:32:46, Lukas Czerner wrote:
> > > On Mon, 22 Nov 2010, Jan Kara wrote:
> > >
> > > > On Mon 22-11-10 12:29:18, Lukas Czerner wrote:
> > > > > It takes fstrim_range structure as an argument. fstrim_range is definec in
> > > > > the include/linux/fs.h.
> > > > >
> > > > > After the FITRIM is done, the number of actually discarded Bytes is stored
> > > > > in fstrim_range.len to give the user better insight on how much storage
> > > > > space has been really released for wear-leveling.
> > > > Umm, why do we have to do this when FITRIM is already handled in
> > > > fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no?
> > >
> > > Hi,
> > >
> > > see upstream commits:
> > > 93bb41f4f8b89ac8b4d0a734bc59634cb0a29a89
> > > e681c047e47c0abe67bf95857f23814372793cb0
> > >
> > > unfortunately people was not happy with generic ioctl interface for that
> > > purpose, there were concerned that it is no common enough to be included
> > > in core vfs and there is no need for new super operation since each
> > > filesystem can easily setup its own ioctl handling.
> > >
> > > When I say people I need to clarify that it was mainly Christoph Hellwig
> > > who had objections against the former implementation and I must say that
> > > he had a point.
> > OK, I see. I had a fresh look at the patches and I've found a few
> > suboptimal things which I've fixed. Most notably I don't think we have to
> > issue a warning when underlying device does not support FITRIM. Returning
> > EOPNOTSUPP should be enough. And I also think that remounting the
> > filesystem read-only or even panicking (the result of calling
> > ext3_std_error()) is necessary when sb_issue_discard() fails for whatever
>
> Did you meant "is NOT necessary" ? Just for clarification.
Yes.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 0/2] Ext3 discard support
@ 2010-11-24 13:06 Jan Kara
2010-11-24 13:06 ` [PATCH 1/2] ext3: Add batched discard support for ext3 Jan Kara
0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2010-11-24 13:06 UTC (permalink / raw)
To: linux-ext4; +Cc: lczerner, greg.freemyer, linux-fsdevel
Hello,
here's Lukas's series with a few minor fixes in error handling in the first
patch and updated changelog of the second patch. I have these patches in my
tree so please speak up if you disagree (otherwise they'll go to Linus in the
next merge window).
Honza
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] ext3: Add batched discard support for ext3
2010-11-24 13:06 [PATCH 0/2] Ext3 discard support Jan Kara
@ 2010-11-24 13:06 ` Jan Kara
2010-11-24 14:56 ` Lukas Czerner
2010-11-25 14:27 ` Lukas Czerner
0 siblings, 2 replies; 17+ messages in thread
From: Jan Kara @ 2010-11-24 13:06 UTC (permalink / raw)
To: linux-ext4; +Cc: lczerner, greg.freemyer, linux-fsdevel, Jan Kara
From: Lukas Czerner <lczerner@redhat.com>
Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).
It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks are
marked as used and then trimmed. Afterwards these blocks are marked as
free in per-group bitmap.
[JK: Fixed up error handling]
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext3/balloc.c | 257 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/ext3_fs.h | 1 +
2 files changed, 258 insertions(+), 0 deletions(-)
diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
index b3db226..ae88960 100644
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
@@ -20,6 +20,7 @@
#include <linux/ext3_jbd.h>
#include <linux/quotaops.h>
#include <linux/buffer_head.h>
+#include <linux/blkdev.h>
/*
* balloc.c contains the blocks allocation and deallocation routines
@@ -39,6 +40,23 @@
#define in_range(b, first, len) ((b) >= (first) && (b) <= (first) + (len) - 1)
+/*
+ * Calculate the block group number and offset, given a block number
+ */
+void ext3_get_group_no_and_offset(struct super_block *sb, ext3_fsblk_t blocknr,
+ unsigned long *blockgrpp, ext3_grpblk_t *offsetp)
+{
+ struct ext3_super_block *es = EXT3_SB(sb)->s_es;
+ ext3_grpblk_t offset;
+
+ blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
+ offset = do_div(blocknr, EXT3_BLOCKS_PER_GROUP(sb));
+ if (offsetp)
+ *offsetp = offset;
+ if (blockgrpp)
+ *blockgrpp = blocknr;
+}
+
/**
* ext3_get_group_desc() -- load group descriptor from disk
* @sb: super block
@@ -1885,3 +1903,242 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
return ext3_bg_num_gdb_meta(sb,group);
}
+
+/**
+ * ext3_trim_all_free -- function to trim all free space in alloc. group
+ * @sb: super block for file system
+ * @group: allocation group to trim
+ * @start: first group block to examine
+ * @max: last group block to examine
+ * @gdp: allocation group description structure
+ * @minblocks: minimum extent block count
+ *
+ * ext3_trim_all_free walks through group's block bitmap searching for free
+ * blocks. When the free block is found, it tries to allocate this block and
+ * consequent free block to get the biggest free extent possible, until it
+ * reaches any used block. Then issue a TRIM command on this extent and free
+ * the extent in the block bitmap. This is done until whole group is scanned.
+ */
+ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
+ ext3_grpblk_t start, ext3_grpblk_t max,
+ ext3_grpblk_t minblocks)
+{
+ handle_t *handle;
+ ext3_grpblk_t next, free_blocks, bit, freed, count = 0;
+ ext3_fsblk_t discard_block;
+ struct ext3_sb_info *sbi;
+ struct buffer_head *gdp_bh, *bitmap_bh = NULL;
+ struct ext3_group_desc *gdp;
+ int err = 0, ret = 0;
+
+ /*
+ * We will update one block bitmap, and one group descriptor
+ */
+ handle = ext3_journal_start_sb(sb, 2);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+
+ bitmap_bh = read_block_bitmap(sb, group);
+ if (!bitmap_bh) {
+ err = -EIO;
+ goto err_out;
+ }
+
+ BUFFER_TRACE(bitmap_bh, "getting undo access");
+ err = ext3_journal_get_undo_access(handle, bitmap_bh);
+ if (err)
+ goto err_out;
+
+ gdp = ext3_get_group_desc(sb, group, &gdp_bh);
+ if (!gdp) {
+ err = -EIO;
+ goto err_out;
+ }
+
+ BUFFER_TRACE(gdp_bh, "get_write_access");
+ err = ext3_journal_get_write_access(handle, gdp_bh);
+ if (err)
+ goto err_out;
+
+ free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+ sbi = EXT3_SB(sb);
+
+ /* Walk through the whole group */
+ while (start < max) {
+ start = bitmap_search_next_usable_block(start, bitmap_bh, max);
+ if (start < 0)
+ break;
+ next = start;
+
+ /*
+ * Allocate contiguous free extents by setting bits in the
+ * block bitmap
+ */
+ while (next < max
+ && claim_block(sb_bgl_lock(sbi, group),
+ next, bitmap_bh)) {
+ next++;
+ }
+
+ /* We did not claim any blocks */
+ if (next == start)
+ continue;
+
+ discard_block = (ext3_fsblk_t)start +
+ ext3_group_first_block_no(sb, group);
+
+ /* Update counters */
+ spin_lock(sb_bgl_lock(sbi, group));
+ le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
+ spin_unlock(sb_bgl_lock(sbi, group));
+ percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
+
+ /* Do not issue a TRIM on extents smaller than minblocks */
+ if ((next - start) < minblocks)
+ goto free_extent;
+
+ /* Send the TRIM command down to the device */
+ err = sb_issue_discard(sb, discard_block, next - start,
+ GFP_NOFS, 0);
+ count += (next - start);
+free_extent:
+ freed = 0;
+
+ /*
+ * Clear bits in the bitmap
+ */
+ for (bit = start; bit < next; bit++) {
+ BUFFER_TRACE(bitmap_bh, "clear bit");
+ if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
+ bit, bitmap_bh->b_data)) {
+ ext3_error(sb, __func__,
+ "bit already cleared for block "E3FSBLK,
+ (unsigned long)bit);
+ BUFFER_TRACE(bitmap_bh, "bit already cleared");
+ } else {
+ freed++;
+ }
+ }
+
+ /* Update couters */
+ spin_lock(sb_bgl_lock(sbi, group));
+ le16_add_cpu(&gdp->bg_free_blocks_count, freed);
+ spin_unlock(sb_bgl_lock(sbi, group));
+ percpu_counter_add(&sbi->s_freeblocks_counter, freed);
+
+ start = next;
+ if (err < 0) {
+ if (err != -EOPNOTSUPP)
+ ext3_warning(sb, __func__, "Discard command "
+ "returned error %d\n", err);
+ break;
+ }
+
+ if (fatal_signal_pending(current)) {
+ err = -ERESTARTSYS;
+ break;
+ }
+
+ cond_resched();
+
+ /* No more suitable extents */
+ if ((free_blocks - count) < minblocks)
+ break;
+ }
+
+ /* We dirtied the bitmap block */
+ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+ err = ext3_journal_dirty_metadata(handle, bitmap_bh);
+
+ /* And the group descriptor block */
+ BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
+ ret = ext3_journal_dirty_metadata(handle, gdp_bh);
+ if (!err)
+ err = ret;
+
+ ext3_debug("trimmed %d blocks in the group %d\n",
+ count, group);
+
+err_out:
+ if (err)
+ count = err;
+ ext3_journal_stop(handle);
+ brelse(bitmap_bh);
+
+ return count;
+}
+
+/**
+ * ext3_trim_fs() -- trim ioctl handle function
+ * @sb: superblock for filesystem
+ * @start: First Byte to trim
+ * @len: number of Bytes to trim from start
+ * @minlen: minimum extent length in Bytes
+ *
+ * ext3_trim_fs goes through all allocation groups containing Bytes from
+ * start to start+len. For each such a group ext3_trim_all_free function
+ * is invoked to trim all free space.
+ */
+int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range)
+{
+ ext3_grpblk_t last_block, first_block, free_blocks;
+ unsigned long first_group, last_group;
+ unsigned long group, ngroups;
+ struct ext3_group_desc *gdp;
+ struct ext3_super_block *es;
+ uint64_t start, len, minlen, trimmed;
+ int ret = 0;
+
+ start = range->start >> sb->s_blocksize_bits;
+ len = range->len >> sb->s_blocksize_bits;
+ minlen = range->minlen >> sb->s_blocksize_bits;
+ trimmed = 0;
+
+ if (unlikely(minlen > EXT3_BLOCKS_PER_GROUP(sb)))
+ return -EINVAL;
+
+ es = EXT3_SB(sb)->s_es;
+ ngroups = EXT3_SB(sb)->s_groups_count;
+ smp_rmb();
+
+ /* Determine first and last group to examine based on start and len */
+ ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) start,
+ &first_group, &first_block);
+ ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) (start + len),
+ &last_group, &last_block);
+ last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
+ last_block = EXT3_BLOCKS_PER_GROUP(sb);
+
+ if (first_group > last_group)
+ return -EINVAL;
+
+ for (group = first_group; group <= last_group; group++) {
+ gdp = ext3_get_group_desc(sb, group, NULL);
+ if (!gdp)
+ break;
+
+ free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+ if (free_blocks < minlen)
+ continue;
+
+ if (len >= EXT3_BLOCKS_PER_GROUP(sb))
+ len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
+ else
+ last_block = len;
+
+ ret = ext3_trim_all_free(sb, group, first_block,
+ last_block, minlen);
+ if (ret < 0)
+ break;
+
+ trimmed += ret;
+ first_block = 0;
+ }
+
+ if (ret >= 0)
+ ret = 0;
+
+ range->len = trimmed * sb->s_blocksize;
+
+ return ret;
+}
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 6ce1bca..a443965 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -856,6 +856,7 @@ extern struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
extern int ext3_should_retry_alloc(struct super_block *sb, int *retries);
extern void ext3_init_block_alloc_info(struct inode *);
extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_window_node *rsv);
+extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
/* dir.c */
extern int ext3_check_dir_entry(const char *, struct inode *,
--
1.7.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3: Add batched discard support for ext3
2010-11-24 13:06 ` [PATCH 1/2] ext3: Add batched discard support for ext3 Jan Kara
@ 2010-11-24 14:56 ` Lukas Czerner
2010-11-24 16:35 ` Jan Kara
2010-11-25 14:27 ` Lukas Czerner
1 sibling, 1 reply; 17+ messages in thread
From: Lukas Czerner @ 2010-11-24 14:56 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, lczerner, greg.freemyer, linux-fsdevel
On Wed, 24 Nov 2010, Jan Kara wrote:
> From: Lukas Czerner <lczerner@redhat.com>
>
> Walk through allocation groups and trim all free extents. It can be
> invoked through FITRIM ioctl on the file system. The main idea is to
> provide a way to trim the whole file system if needed, since some SSD's
> may suffer from performance loss after the whole device was filled (it
> does not mean that fs is full!).
>
> It search for free extents in allocation groups specified by Byte range
> start -> start+len. When the free extent is within this range, blocks are
> marked as used and then trimmed. Afterwards these blocks are marked as
> free in per-group bitmap.
>
> [JK: Fixed up error handling]
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/ext3/balloc.c | 257 +++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/ext3_fs.h | 1 +
> 2 files changed, 258 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
> index b3db226..ae88960 100644
> --- a/fs/ext3/balloc.c
> +++ b/fs/ext3/balloc.c
> @@ -20,6 +20,7 @@
> #include <linux/ext3_jbd.h>
> #include <linux/quotaops.h>
> #include <linux/buffer_head.h>
> +#include <linux/blkdev.h>
>
> /*
> * balloc.c contains the blocks allocation and deallocation routines
> @@ -39,6 +40,23 @@
>
> #define in_range(b, first, len) ((b) >= (first) && (b) <= (first) + (len) - 1)
>
> +/*
> + * Calculate the block group number and offset, given a block number
> + */
> +void ext3_get_group_no_and_offset(struct super_block *sb, ext3_fsblk_t blocknr,
> + unsigned long *blockgrpp, ext3_grpblk_t *offsetp)
> +{
> + struct ext3_super_block *es = EXT3_SB(sb)->s_es;
> + ext3_grpblk_t offset;
> +
> + blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
> + offset = do_div(blocknr, EXT3_BLOCKS_PER_GROUP(sb));
> + if (offsetp)
> + *offsetp = offset;
> + if (blockgrpp)
> + *blockgrpp = blocknr;
> +}
> +
> /**
> * ext3_get_group_desc() -- load group descriptor from disk
> * @sb: super block
> @@ -1885,3 +1903,242 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
> return ext3_bg_num_gdb_meta(sb,group);
>
> }
> +
> +/**
> + * ext3_trim_all_free -- function to trim all free space in alloc. group
> + * @sb: super block for file system
> + * @group: allocation group to trim
> + * @start: first group block to examine
> + * @max: last group block to examine
> + * @gdp: allocation group description structure
> + * @minblocks: minimum extent block count
> + *
> + * ext3_trim_all_free walks through group's block bitmap searching for free
> + * blocks. When the free block is found, it tries to allocate this block and
> + * consequent free block to get the biggest free extent possible, until it
> + * reaches any used block. Then issue a TRIM command on this extent and free
> + * the extent in the block bitmap. This is done until whole group is scanned.
> + */
> +ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
> + ext3_grpblk_t start, ext3_grpblk_t max,
> + ext3_grpblk_t minblocks)
> +{
> + handle_t *handle;
> + ext3_grpblk_t next, free_blocks, bit, freed, count = 0;
> + ext3_fsblk_t discard_block;
> + struct ext3_sb_info *sbi;
> + struct buffer_head *gdp_bh, *bitmap_bh = NULL;
> + struct ext3_group_desc *gdp;
> + int err = 0, ret = 0;
> +
> + /*
> + * We will update one block bitmap, and one group descriptor
> + */
> + handle = ext3_journal_start_sb(sb, 2);
> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> +
> + bitmap_bh = read_block_bitmap(sb, group);
> + if (!bitmap_bh) {
> + err = -EIO;
> + goto err_out;
> + }
> +
> + BUFFER_TRACE(bitmap_bh, "getting undo access");
> + err = ext3_journal_get_undo_access(handle, bitmap_bh);
> + if (err)
> + goto err_out;
> +
> + gdp = ext3_get_group_desc(sb, group, &gdp_bh);
> + if (!gdp) {
> + err = -EIO;
> + goto err_out;
> + }
> +
> + BUFFER_TRACE(gdp_bh, "get_write_access");
> + err = ext3_journal_get_write_access(handle, gdp_bh);
> + if (err)
> + goto err_out;
> +
> + free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
> + sbi = EXT3_SB(sb);
> +
> + /* Walk through the whole group */
> + while (start < max) {
> + start = bitmap_search_next_usable_block(start, bitmap_bh, max);
> + if (start < 0)
> + break;
> + next = start;
> +
> + /*
> + * Allocate contiguous free extents by setting bits in the
> + * block bitmap
> + */
> + while (next < max
> + && claim_block(sb_bgl_lock(sbi, group),
> + next, bitmap_bh)) {
> + next++;
> + }
> +
> + /* We did not claim any blocks */
> + if (next == start)
> + continue;
> +
> + discard_block = (ext3_fsblk_t)start +
> + ext3_group_first_block_no(sb, group);
> +
> + /* Update counters */
> + spin_lock(sb_bgl_lock(sbi, group));
> + le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
> + spin_unlock(sb_bgl_lock(sbi, group));
> + percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
> +
> + /* Do not issue a TRIM on extents smaller than minblocks */
> + if ((next - start) < minblocks)
> + goto free_extent;
> +
> + /* Send the TRIM command down to the device */
> + err = sb_issue_discard(sb, discard_block, next - start,
> + GFP_NOFS, 0);
> + count += (next - start);
> +free_extent:
> + freed = 0;
> +
> + /*
> + * Clear bits in the bitmap
> + */
> + for (bit = start; bit < next; bit++) {
> + BUFFER_TRACE(bitmap_bh, "clear bit");
> + if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
> + bit, bitmap_bh->b_data)) {
> + ext3_error(sb, __func__,
> + "bit already cleared for block "E3FSBLK,
> + (unsigned long)bit);
> + BUFFER_TRACE(bitmap_bh, "bit already cleared");
> + } else {
> + freed++;
> + }
> + }
> +
> + /* Update couters */
> + spin_lock(sb_bgl_lock(sbi, group));
> + le16_add_cpu(&gdp->bg_free_blocks_count, freed);
> + spin_unlock(sb_bgl_lock(sbi, group));
> + percpu_counter_add(&sbi->s_freeblocks_counter, freed);
> +
> + start = next;
> + if (err < 0) {
> + if (err != -EOPNOTSUPP)
> + ext3_warning(sb, __func__, "Discard command "
> + "returned error %d\n", err);
Maybe we can remove this warning completely and let use-space utility
handle and print out error message ?
> + break;
> + }
> +
> + if (fatal_signal_pending(current)) {
> + err = -ERESTARTSYS;
> + break;
> + }
> +
> + cond_resched();
> +
> + /* No more suitable extents */
> + if ((free_blocks - count) < minblocks)
> + break;
> + }
> +
> + /* We dirtied the bitmap block */
> + BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> + err = ext3_journal_dirty_metadata(handle, bitmap_bh);
> +
> + /* And the group descriptor block */
> + BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
> + ret = ext3_journal_dirty_metadata(handle, gdp_bh);
> + if (!err)
> + err = ret;
> +
> + ext3_debug("trimmed %d blocks in the group %d\n",
> + count, group);
> +
> +err_out:
> + if (err)
> + count = err;
> + ext3_journal_stop(handle);
> + brelse(bitmap_bh);
> +
> + return count;
> +}
> +
> +/**
> + * ext3_trim_fs() -- trim ioctl handle function
> + * @sb: superblock for filesystem
> + * @start: First Byte to trim
> + * @len: number of Bytes to trim from start
> + * @minlen: minimum extent length in Bytes
> + *
> + * ext3_trim_fs goes through all allocation groups containing Bytes from
> + * start to start+len. For each such a group ext3_trim_all_free function
> + * is invoked to trim all free space.
> + */
> +int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range)
> +{
> + ext3_grpblk_t last_block, first_block, free_blocks;
> + unsigned long first_group, last_group;
> + unsigned long group, ngroups;
> + struct ext3_group_desc *gdp;
> + struct ext3_super_block *es;
> + uint64_t start, len, minlen, trimmed;
> + int ret = 0;
> +
> + start = range->start >> sb->s_blocksize_bits;
> + len = range->len >> sb->s_blocksize_bits;
> + minlen = range->minlen >> sb->s_blocksize_bits;
> + trimmed = 0;
> +
> + if (unlikely(minlen > EXT3_BLOCKS_PER_GROUP(sb)))
> + return -EINVAL;
> +
> + es = EXT3_SB(sb)->s_es;
> + ngroups = EXT3_SB(sb)->s_groups_count;
> + smp_rmb();
> +
> + /* Determine first and last group to examine based on start and len */
> + ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) start,
> + &first_group, &first_block);
> + ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) (start + len),
> + &last_group, &last_block);
> + last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
> + last_block = EXT3_BLOCKS_PER_GROUP(sb);
> +
> + if (first_group > last_group)
> + return -EINVAL;
> +
> + for (group = first_group; group <= last_group; group++) {
> + gdp = ext3_get_group_desc(sb, group, NULL);
> + if (!gdp)
> + break;
> +
> + free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
> + if (free_blocks < minlen)
> + continue;
> +
> + if (len >= EXT3_BLOCKS_PER_GROUP(sb))
> + len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
> + else
> + last_block = len;
> +
> + ret = ext3_trim_all_free(sb, group, first_block,
> + last_block, minlen);
> + if (ret < 0)
> + break;
> +
> + trimmed += ret;
> + first_block = 0;
> + }
> +
> + if (ret >= 0)
> + ret = 0;
> +
> + range->len = trimmed * sb->s_blocksize;
> +
> + return ret;
> +}
> diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
> index 6ce1bca..a443965 100644
> --- a/include/linux/ext3_fs.h
> +++ b/include/linux/ext3_fs.h
> @@ -856,6 +856,7 @@ extern struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
> extern int ext3_should_retry_alloc(struct super_block *sb, int *retries);
> extern void ext3_init_block_alloc_info(struct inode *);
> extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_window_node *rsv);
> +extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
>
> /* dir.c */
> extern int ext3_check_dir_entry(const char *, struct inode *,
>
--
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3: Add batched discard support for ext3
2010-11-24 14:56 ` Lukas Czerner
@ 2010-11-24 16:35 ` Jan Kara
0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2010-11-24 16:35 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Jan Kara, linux-ext4, greg.freemyer, linux-fsdevel
On Wed 24-11-10 15:56:33, Lukas Czerner wrote:
> On Wed, 24 Nov 2010, Jan Kara wrote:
> > + if (err < 0) {
> > + if (err != -EOPNOTSUPP)
> > + ext3_warning(sb, __func__, "Discard command "
> > + "returned error %d\n", err);
>
> Maybe we can remove this warning completely and let use-space utility
> handle and print out error message ?
We could but at this point something strange is happening (ENOMEM, EIO,
or something like that) so issuing a warning makes some sense. So I'd be
maybe slightly in favor of keeping the warning.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3: Add batched discard support for ext3
2010-11-24 13:06 ` [PATCH 1/2] ext3: Add batched discard support for ext3 Jan Kara
2010-11-24 14:56 ` Lukas Czerner
@ 2010-11-25 14:27 ` Lukas Czerner
2010-11-25 18:29 ` Andreas Dilger
2011-01-06 14:26 ` Jan Kara
1 sibling, 2 replies; 17+ messages in thread
From: Lukas Czerner @ 2010-11-25 14:27 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, lczerner, greg.freemyer, linux-fsdevel
On Wed, 24 Nov 2010, Jan Kara wrote:
> From: Lukas Czerner <lczerner@redhat.com>
>
> Walk through allocation groups and trim all free extents. It can be
> invoked through FITRIM ioctl on the file system. The main idea is to
> provide a way to trim the whole file system if needed, since some SSD's
> may suffer from performance loss after the whole device was filled (it
> does not mean that fs is full!).
>
> It search for free extents in allocation groups specified by Byte range
> start -> start+len. When the free extent is within this range, blocks are
> marked as used and then trimmed. Afterwards these blocks are marked as
> free in per-group bitmap.
>
> [JK: Fixed up error handling]
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/ext3/balloc.c | 257 +++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/ext3_fs.h | 1 +
> 2 files changed, 258 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
> index b3db226..ae88960 100644
> --- a/fs/ext3/balloc.c
> +++ b/fs/ext3/balloc.c
> @@ -20,6 +20,7 @@
> #include <linux/ext3_jbd.h>
> #include <linux/quotaops.h>
> #include <linux/buffer_head.h>
> +#include <linux/blkdev.h>
>
> /*
> * balloc.c contains the blocks allocation and deallocation routines
> @@ -39,6 +40,23 @@
>
> #define in_range(b, first, len) ((b) >= (first) && (b) <= (first) + (len) - 1)
>
> +/*
> + * Calculate the block group number and offset, given a block number
> + */
> +void ext3_get_group_no_and_offset(struct super_block *sb, ext3_fsblk_t blocknr,
> + unsigned long *blockgrpp, ext3_grpblk_t *offsetp)
> +{
> + struct ext3_super_block *es = EXT3_SB(sb)->s_es;
> + ext3_grpblk_t offset;
> +
> + blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
> + offset = do_div(blocknr, EXT3_BLOCKS_PER_GROUP(sb));
> + if (offsetp)
> + *offsetp = offset;
> + if (blockgrpp)
> + *blockgrpp = blocknr;
> +}
> +
> /**
> * ext3_get_group_desc() -- load group descriptor from disk
> * @sb: super block
> @@ -1885,3 +1903,242 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
> return ext3_bg_num_gdb_meta(sb,group);
>
> }
> +
> +/**
> + * ext3_trim_all_free -- function to trim all free space in alloc. group
> + * @sb: super block for file system
> + * @group: allocation group to trim
> + * @start: first group block to examine
> + * @max: last group block to examine
> + * @gdp: allocation group description structure
> + * @minblocks: minimum extent block count
> + *
> + * ext3_trim_all_free walks through group's block bitmap searching for free
> + * blocks. When the free block is found, it tries to allocate this block and
> + * consequent free block to get the biggest free extent possible, until it
> + * reaches any used block. Then issue a TRIM command on this extent and free
> + * the extent in the block bitmap. This is done until whole group is scanned.
> + */
> +ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
> + ext3_grpblk_t start, ext3_grpblk_t max,
> + ext3_grpblk_t minblocks)
> +{
> + handle_t *handle;
> + ext3_grpblk_t next, free_blocks, bit, freed, count = 0;
> + ext3_fsblk_t discard_block;
> + struct ext3_sb_info *sbi;
> + struct buffer_head *gdp_bh, *bitmap_bh = NULL;
> + struct ext3_group_desc *gdp;
> + int err = 0, ret = 0;
> +
> + /*
> + * We will update one block bitmap, and one group descriptor
> + */
> + handle = ext3_journal_start_sb(sb, 2);
> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> +
> + bitmap_bh = read_block_bitmap(sb, group);
> + if (!bitmap_bh) {
> + err = -EIO;
> + goto err_out;
> + }
> +
> + BUFFER_TRACE(bitmap_bh, "getting undo access");
> + err = ext3_journal_get_undo_access(handle, bitmap_bh);
> + if (err)
> + goto err_out;
> +
> + gdp = ext3_get_group_desc(sb, group, &gdp_bh);
> + if (!gdp) {
> + err = -EIO;
> + goto err_out;
> + }
> +
> + BUFFER_TRACE(gdp_bh, "get_write_access");
> + err = ext3_journal_get_write_access(handle, gdp_bh);
> + if (err)
> + goto err_out;
> +
> + free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
> + sbi = EXT3_SB(sb);
> +
> + /* Walk through the whole group */
> + while (start < max) {
> + start = bitmap_search_next_usable_block(start, bitmap_bh, max);
> + if (start < 0)
> + break;
> + next = start;
> +
> + /*
> + * Allocate contiguous free extents by setting bits in the
> + * block bitmap
> + */
> + while (next < max
> + && claim_block(sb_bgl_lock(sbi, group),
> + next, bitmap_bh)) {
> + next++;
> + }
> +
> + /* We did not claim any blocks */
> + if (next == start)
> + continue;
> +
> + discard_block = (ext3_fsblk_t)start +
> + ext3_group_first_block_no(sb, group);
> +
> + /* Update counters */
> + spin_lock(sb_bgl_lock(sbi, group));
> + le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
> + spin_unlock(sb_bgl_lock(sbi, group));
> + percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
> +
> + /* Do not issue a TRIM on extents smaller than minblocks */
> + if ((next - start) < minblocks)
> + goto free_extent;
> +
> + /* Send the TRIM command down to the device */
> + err = sb_issue_discard(sb, discard_block, next - start,
> + GFP_NOFS, 0);
> + count += (next - start);
> +free_extent:
> + freed = 0;
> +
> + /*
> + * Clear bits in the bitmap
> + */
> + for (bit = start; bit < next; bit++) {
> + BUFFER_TRACE(bitmap_bh, "clear bit");
> + if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
> + bit, bitmap_bh->b_data)) {
> + ext3_error(sb, __func__,
> + "bit already cleared for block "E3FSBLK,
> + (unsigned long)bit);
> + BUFFER_TRACE(bitmap_bh, "bit already cleared");
> + } else {
> + freed++;
> + }
> + }
> +
> + /* Update couters */
> + spin_lock(sb_bgl_lock(sbi, group));
> + le16_add_cpu(&gdp->bg_free_blocks_count, freed);
> + spin_unlock(sb_bgl_lock(sbi, group));
> + percpu_counter_add(&sbi->s_freeblocks_counter, freed);
> +
> + start = next;
> + if (err < 0) {
> + if (err != -EOPNOTSUPP)
> + ext3_warning(sb, __func__, "Discard command "
> + "returned error %d\n", err);
> + break;
> + }
> +
> + if (fatal_signal_pending(current)) {
> + err = -ERESTARTSYS;
> + break;
> + }
> +
> + cond_resched();
> +
> + /* No more suitable extents */
> + if ((free_blocks - count) < minblocks)
> + break;
> + }
> +
> + /* We dirtied the bitmap block */
> + BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> + err = ext3_journal_dirty_metadata(handle, bitmap_bh);
> +
> + /* And the group descriptor block */
> + BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
> + ret = ext3_journal_dirty_metadata(handle, gdp_bh);
> + if (!err)
> + err = ret;
> +
> + ext3_debug("trimmed %d blocks in the group %d\n",
> + count, group);
> +
> +err_out:
> + if (err)
> + count = err;
> + ext3_journal_stop(handle);
> + brelse(bitmap_bh);
> +
> + return count;
> +}
> +
> +/**
> + * ext3_trim_fs() -- trim ioctl handle function
> + * @sb: superblock for filesystem
> + * @start: First Byte to trim
> + * @len: number of Bytes to trim from start
> + * @minlen: minimum extent length in Bytes
> + *
> + * ext3_trim_fs goes through all allocation groups containing Bytes from
> + * start to start+len. For each such a group ext3_trim_all_free function
> + * is invoked to trim all free space.
> + */
> +int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range)
> +{
> + ext3_grpblk_t last_block, first_block, free_blocks;
> + unsigned long first_group, last_group;
> + unsigned long group, ngroups;
> + struct ext3_group_desc *gdp;
> + struct ext3_super_block *es;
> + uint64_t start, len, minlen, trimmed;
> + int ret = 0;
We probably need to add this:
ext3_fsblk_t blocks_count = le32_to_cpu(EXT3_SB(sb)->s_es->s_blocks_count);
> +
> + start = range->start >> sb->s_blocksize_bits;
> + len = range->len >> sb->s_blocksize_bits;
> + minlen = range->minlen >> sb->s_blocksize_bits;
> + trimmed = 0;
and this:
if (len > blocks_count)
len = blocks_count - start;
Because when determining last group through ext3_get_group_no_and_offset()
the result may be wrong in cases when range->start and range-len are too
big, because it may overflow when summing up those two numbers.
> +
> + if (unlikely(minlen > EXT3_BLOCKS_PER_GROUP(sb)))
> + return -EINVAL;
> +
> + es = EXT3_SB(sb)->s_es;
> + ngroups = EXT3_SB(sb)->s_groups_count;
> + smp_rmb();
> +
> + /* Determine first and last group to examine based on start and len */
> + ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) start,
> + &first_group, &first_block);
> + ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) (start + len),
^^^^^^^^^^^^^
here
overflow may occur.
> + &last_group, &last_block);
> + last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
> + last_block = EXT3_BLOCKS_PER_GROUP(sb);
> +
> + if (first_group > last_group)
> + return -EINVAL;
> +
> + for (group = first_group; group <= last_group; group++) {
> + gdp = ext3_get_group_desc(sb, group, NULL);
> + if (!gdp)
> + break;
> +
> + free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
> + if (free_blocks < minlen)
> + continue;
> +
> + if (len >= EXT3_BLOCKS_PER_GROUP(sb))
> + len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
> + else
> + last_block = len;
> +
> + ret = ext3_trim_all_free(sb, group, first_block,
> + last_block, minlen);
> + if (ret < 0)
> + break;
> +
> + trimmed += ret;
> + first_block = 0;
> + }
> +
> + if (ret >= 0)
> + ret = 0;
> +
> + range->len = trimmed * sb->s_blocksize;
> +
> + return ret;
> +}
> diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
> index 6ce1bca..a443965 100644
> --- a/include/linux/ext3_fs.h
> +++ b/include/linux/ext3_fs.h
> @@ -856,6 +856,7 @@ extern struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
> extern int ext3_should_retry_alloc(struct super_block *sb, int *retries);
> extern void ext3_init_block_alloc_info(struct inode *);
> extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_window_node *rsv);
> +extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
>
> /* dir.c */
> extern int ext3_check_dir_entry(const char *, struct inode *,
>
--
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3: Add batched discard support for ext3
2010-11-25 14:27 ` Lukas Czerner
@ 2010-11-25 18:29 ` Andreas Dilger
2010-11-26 8:01 ` Lukas Czerner
2011-01-06 14:26 ` Jan Kara
1 sibling, 1 reply; 17+ messages in thread
From: Andreas Dilger @ 2010-11-25 18:29 UTC (permalink / raw)
To: Jan Kara, Lukas Czerner; +Cc: Ext4 Developers List, linux-fsdevel
On 2010-11-25, at 07:27, Lukas Czerner wrote:
>> Walk through allocation groups and trim all free extents. It can be
>> invoked through FITRIM ioctl on the file system. The main idea is to
>> provide a way to trim the whole file system if needed, since some SSD's
>> may suffer from performance loss after the whole device was filled (it
>> does not mean that fs is full!).
One question I have is why a major change like this is being done in ext3 instead of only in ext4? There are a lot of ext4 features that _could_ be included in ext3 (basically all of them), but the request from the rest of the kernel developers was to leave ext3 with minimal changes (for continued stability), and put all of the major changes into ext4.
With the current ext4 code's performance and feature advantages, and widespread use in newer distros, I don't think there is a good reason to make major modifications to ext3. It should remain as the stable legacy filesystem for some number of releases, and then possibly we should just remove one or both of ext2 and ext3 and use the ext4 code for both.
Given that Google is running ext4 in no-journal mode (i.e. ext2-like) on many thousands of machines, and getting _much_ better performance than the old ext2 code, it doesn't make sense to continue investing so much maintenance effort into ext2 and ext3.
Cheers, Andreas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3: Add batched discard support for ext3
2010-11-25 18:29 ` Andreas Dilger
@ 2010-11-26 8:01 ` Lukas Czerner
2010-11-26 10:33 ` Andreas Dilger
0 siblings, 1 reply; 17+ messages in thread
From: Lukas Czerner @ 2010-11-26 8:01 UTC (permalink / raw)
To: Andreas Dilger
Cc: Jan Kara, Lukas Czerner, Ext4 Developers List, linux-fsdevel
On Thu, 25 Nov 2010, Andreas Dilger wrote:
> On 2010-11-25, at 07:27, Lukas Czerner wrote:
> >> Walk through allocation groups and trim all free extents. It can be
> >> invoked through FITRIM ioctl on the file system. The main idea is to
> >> provide a way to trim the whole file system if needed, since some SSD's
> >> may suffer from performance loss after the whole device was filled (it
> >> does not mean that fs is full!).
>
> One question I have is why a major change like this is being done in ext3 instead of only in ext4? There are a lot of ext4 features that _could_ be included in ext3 (basically all of them), but the request from the rest of the kernel developers was to leave ext3 with minimal changes (for continued stability), and put all of the major changes into ext4.
You're right that ext3 should be "closed" for major (intrusive) changes,
however this is not a major change at all. It interacts with ext3 code
just very little and it is very well separated. Basically, it is dead
code until FITRIM ioctl is done.
Given that there are still a lot of people using ext3, and SSD's are
here right now, we should consider providing that feature to ext3 as
well. When you comprehend this features intrusiveness (well separated)
and it's usefulness into consideration, I think that the ratio is well
in favour of including this into ext3.
>
> With the current ext4 code's performance and feature advantages, and widespread use in newer distros, I don't think there is a good reason to make major modifications to ext3. It should remain as the stable legacy filesystem for some number of releases, and then possibly we should just remove one or both of ext2 and ext3 and use the ext4 code for both.
>
> Given that Google is running ext4 in no-journal mode (i.e. ext2-like) on many thousands of machines, and getting _much_ better performance than the old ext2 code, it doesn't make sense to continue investing so much maintenance effort into ext2 and ext3.
>
> Cheers, Andreas
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3: Add batched discard support for ext3
2010-11-26 8:01 ` Lukas Czerner
@ 2010-11-26 10:33 ` Andreas Dilger
0 siblings, 0 replies; 17+ messages in thread
From: Andreas Dilger @ 2010-11-26 10:33 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Jan Kara, Ext4 Developers List, linux-fsdevel
On 2010-11-26, at 01:01, Lukas Czerner wrote:
> On Thu, 25 Nov 2010, Andreas Dilger wrote:
>> One question I have is why a change like this is being done in ext3 instead of only in ext4? There are a lot of ext4 features that _could_ be included in ext3 (basically all of them), but the request from the rest of the kernel developers was to leave ext3 with minimal changes (for continued stability), and put all of the major changes into ext4.
>
> You're right that ext3 should be "closed" for major (intrusive) changes,
> however this is not a major change at all. It interacts with ext3 code
> just very little and it is very well separated. Basically, it is dead
> code until FITRIM ioctl is done.
I don't want to pick on this patch, per se, but if someone is installing a new filesystem on an SSD with a new kernel, they could just as easily use ext4 for that instead of ext3, and they likely should use ext4 because of performance. Either they are using ext3 because they don't want any changes (in which case they also don't want this one), or if it is a new filesystem on a new device they can as easily use ext4.
Since this code is only going to end up in new kernels, this isn't about what users are using for legacy systems where only ext3 is available.
Cheers, Andreas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3: Add batched discard support for ext3
2010-11-25 14:27 ` Lukas Czerner
2010-11-25 18:29 ` Andreas Dilger
@ 2011-01-06 14:26 ` Jan Kara
1 sibling, 0 replies; 17+ messages in thread
From: Jan Kara @ 2011-01-06 14:26 UTC (permalink / raw)
To: Lukas Czerner; +Cc: Jan Kara, linux-ext4, greg.freemyer, linux-fsdevel
Hi,
On Thu 25-11-10 15:27:09, Lukas Czerner wrote:
> > +/**
> > + * ext3_trim_fs() -- trim ioctl handle function
> > + * @sb: superblock for filesystem
> > + * @start: First Byte to trim
> > + * @len: number of Bytes to trim from start
> > + * @minlen: minimum extent length in Bytes
> > + *
> > + * ext3_trim_fs goes through all allocation groups containing Bytes from
> > + * start to start+len. For each such a group ext3_trim_all_free function
> > + * is invoked to trim all free space.
> > + */
> > +int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range)
> > +{
> > + ext3_grpblk_t last_block, first_block, free_blocks;
> > + unsigned long first_group, last_group;
> > + unsigned long group, ngroups;
> > + struct ext3_group_desc *gdp;
> > + struct ext3_super_block *es;
> > + uint64_t start, len, minlen, trimmed;
> > + int ret = 0;
> We probably need to add this:
>
> ext3_fsblk_t blocks_count = le32_to_cpu(EXT3_SB(sb)->s_es->s_blocks_count);
>
> > +
> > + start = range->start >> sb->s_blocksize_bits;
> > + len = range->len >> sb->s_blocksize_bits;
> > + minlen = range->minlen >> sb->s_blocksize_bits;
> > + trimmed = 0;
> and this:
>
> if (len > blocks_count)
> len = blocks_count - start;
Thanks for letting me know. The above could go negative if start is too
big. So I've ended up with checks:
+ if (start >= max_blks)
+ goto out;
+ if (start + len > max_blks)
+ len = max_blks - start;
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-01-06 14:26 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-22 11:29 [PATCH 1/2] ext3: Add batched discard support for ext3 Lukas Czerner
2010-11-22 11:29 ` [PATCH 2/2] ext3: Add FITRIM handle " Lukas Czerner
2010-11-22 16:13 ` Greg Freemyer
2010-11-23 11:06 ` Lukas Czerner
2010-11-22 17:42 ` Jan Kara
2010-11-23 10:32 ` Lukas Czerner
2010-11-24 13:01 ` Jan Kara
2010-11-24 14:32 ` Lukas Czerner
2010-11-24 16:32 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2010-11-24 13:06 [PATCH 0/2] Ext3 discard support Jan Kara
2010-11-24 13:06 ` [PATCH 1/2] ext3: Add batched discard support for ext3 Jan Kara
2010-11-24 14:56 ` Lukas Czerner
2010-11-24 16:35 ` Jan Kara
2010-11-25 14:27 ` Lukas Czerner
2010-11-25 18:29 ` Andreas Dilger
2010-11-26 8:01 ` Lukas Czerner
2010-11-26 10:33 ` Andreas Dilger
2011-01-06 14:26 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).