* [RFC] implementation of delayed extent list
@ 2011-08-30 9:00 Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 1/5] ext4: add two structures delayed extent list nees Yongqiang Yang
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-08-30 9:00 UTC (permalink / raw)
To: linux-ext4; +Cc: achender, adityakali, tytso
Hi,
This patch series implements delayed extent list, more analysis is
in the 3rd patch.
Any feedbacks or suggestions are welcome.
Yongqiang.
[RFC PATCH 1/5] ext4: add two structures delayed extent list nees
[RFC PATCH 2/5] ext4: add a delayed extents list in inode
[RFC PATCH 3/5] ext4: add operations on delayed extent tree
[RFC PATCH 4/5] ext4: let ext4 maintian delayed extent lists
[RFC PATCH 5/5] ext4: reimplement fiemap on delayed extent list
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 1/5] ext4: add two structures delayed extent list nees
2011-08-30 9:00 [RFC] implementation of delayed extent list Yongqiang Yang
@ 2011-08-30 9:00 ` Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 2/5] ext4: add a delayed extents list in inode Yongqiang Yang
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-08-30 9:00 UTC (permalink / raw)
To: linux-ext4; +Cc: achender, adityakali, tytso, Yongqiang Yang
This patch adds two structures that delayed extent list needs.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
fs/ext4/delayed_extents.h | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)
create mode 100644 fs/ext4/delayed_extents.h
diff --git a/fs/ext4/delayed_extents.h b/fs/ext4/delayed_extents.h
new file mode 100644
index 0000000..b7baaef
--- /dev/null
+++ b/fs/ext4/delayed_extents.h
@@ -0,0 +1,22 @@
+/*
+ * fs/ext4/delayed_extents.h
+ *
+ * Written by Yongqiang Yang <xiaoqiangnk@gmail.com>
+ *
+ */
+
+#ifndef _EXT4_DELAYED_EXTENTS_H
+#define _EXT4_DELAYED_EXTENTS_H
+
+struct ext4_delayed_extent {
+ struct list_head de_list;
+ ext4_lblk_t de_start; /* first block extent covers */
+ ext4_lblk_t de_end; /* block at which extent ends */
+};
+
+struct ext4_delayed_info {
+ struct list_head de_list;
+ struct ext4_delayed_extent *cache_de; /* recently accessed extent */
+};
+
+#endif /* _EXT4_DELAYED_EXTENTS_H */
--
1.7.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 2/5] ext4: add a delayed extents list in inode
2011-08-30 9:00 [RFC] implementation of delayed extent list Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 1/5] ext4: add two structures delayed extent list nees Yongqiang Yang
@ 2011-08-30 9:00 ` Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 3/5] ext4: add operations on delayed extent tree Yongqiang Yang
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-08-30 9:00 UTC (permalink / raw)
To: linux-ext4; +Cc: achender, adityakali, tytso, Yongqiang Yang
This patch adds a delayed extents list in inode, so that
delayed extents can be look up quickly. Without the tree
delayed extents are identified by looking up page cache.
With the list, FIEMAP, SEEK_HOLE/DATA, bigalloc and writeout
path can be optimized a lot.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
fs/ext4/ext4.h | 5 +++++
fs/ext4/super.c | 2 ++
2 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b4bff1a..6c5bfd6 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -759,6 +759,8 @@ struct ext4_ext_cache {
__u32 ec_len; /* must be 32bit to return holes */
};
+#include "delayed_extents.h"
+
/*
* fourth extended file system inode data in memory
*/
@@ -836,6 +838,9 @@ struct ext4_inode_info {
struct list_head i_prealloc_list;
spinlock_t i_prealloc_lock;
+ /* delayed_extents.c */
+ struct ext4_delayed_info i_delayed_info;
+
/* ialloc */
ext4_group_t i_last_alloc_group;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c3e3a80..f0b6fe6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -880,6 +880,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
ei->vfs_inode.i_version = 1;
ei->vfs_inode.i_data.writeback_index = 0;
memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
+ memset(&ei->i_delayed_info, 0, sizeof(struct ext4_delayed_info));
+ INIT_LIST_HEAD(&ei->i_delayed_info.de_list);
INIT_LIST_HEAD(&ei->i_prealloc_list);
spin_lock_init(&ei->i_prealloc_lock);
ei->i_reserved_data_blocks = 0;
--
1.7.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 3/5] ext4: add operations on delayed extent tree
2011-08-30 9:00 [RFC] implementation of delayed extent list Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 1/5] ext4: add two structures delayed extent list nees Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 2/5] ext4: add a delayed extents list in inode Yongqiang Yang
@ 2011-08-30 9:00 ` Yongqiang Yang
2011-09-08 18:58 ` Jan Kara
2011-09-09 9:32 ` Lukas Czerner
2011-08-30 9:00 ` [RFC PATCH 4/5] ext4: let ext4 maintian delayed extent lists Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 5/5] ext4: reimplement fiemap on delayed extent list Yongqiang Yang
4 siblings, 2 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-08-30 9:00 UTC (permalink / raw)
To: linux-ext4; +Cc: achender, adityakali, tytso, Yongqiang Yang
This patch adds operations on a delayed extent tree.
Signed-off-by; Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
fs/ext4/Makefile | 2 +-
fs/ext4/delayed_extents.c | 390 +++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/delayed_extents.h | 25 +++
3 files changed, 416 insertions(+), 1 deletions(-)
create mode 100644 fs/ext4/delayed_extents.c
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 56fd8f86..ee16ad3 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_EXT4_FS) += ext4.o
ext4-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o page-io.o \
ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \
ext4_jbd2.o migrate.o mballoc.o block_validity.o move_extent.o \
- mmp.o indirect.o
+ mmp.o indirect.o delayed_extents.o
ext4-$(CONFIG_EXT4_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o
ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
diff --git a/fs/ext4/delayed_extents.c b/fs/ext4/delayed_extents.c
new file mode 100644
index 0000000..6275b19
--- /dev/null
+++ b/fs/ext4/delayed_extents.c
@@ -0,0 +1,390 @@
+/*
+ * fs/ext4/delayed_extents.c
+ *
+ * Written by Yongqiang Yang <xiaoqiangnk@gmail.com>
+ *
+ * Ext4 delayed-extents-list core functions.
+ */
+
+#include "ext4.h"
+#include "delayed_extents.h"
+#include "ext4_extents.h"
+
+/*
+ * delayed extent implementation for ext4.
+ *
+ *
+ * ==========================================================================
+ * 1. Why delayed extent implementation ?
+ *
+ * Without delayed extent, ext4 identifies a delayed extent by looking up
+ * page cache, this has several deficiencies - complicated, buggy, and
+ * inefficient code.
+ *
+ * FIEMAP, SEEK_HOLE/DATA, bigalloc, punch hole and writeout all need to know if
+ * a block or a range of blocks are belonged to a delayed extent.
+ *
+ * Let us have a look at how they do without delayed extents implementation.
+ * -- FIEMAP
+ * FIEMAP looks up page cache to identify delayed allocations from holes.
+ *
+ * -- SEEK_HOLE/DATA
+ * SEEK_HOLE/DATA has the same problem as FIEMAP.
+ *
+ * -- bigalloc
+ * bigalloc looks up page cache to figure out if a block is already
+ * under delayed allocation or not to determine whether quota reserving
+ * is needed for the cluster.
+ *
+ * -- punch hole
+ * punch hole looks up page cache to identify a delayed extent.
+ *
+ * -- writeout
+ * Writeout looks up whole page cache to see if a buffer is mapped, If
+ * there are not very many delayed buffers, then it is time comsuming.
+ *
+ * With delayed extents implementation, FIEMAP, SEEK_HOLE/DATA, bigalloc and
+ * writeout can figure out if a block or a range of blocks is under delayed
+ * allocation(belonged to a delayed extent) or not by searching the delayed
+ * extent list.
+ *
+ *
+ * ==========================================================================
+ * 2. ext4 delayed extents impelmentation
+ *
+ * -- delayed extent
+ * A delayed extent is a range of blocks which are contiguous logically and
+ * under delayed allocation. Unlike extent in ext4, delayed extent in ext4
+ * is a in-memory struct, there is no corresponding on-disk data. There is
+ * no limit on length of delayed extent, so a delayed extent can contain as
+ * many blocks as they are contiguous logically.
+ *
+ * -- delayed extent list
+ * Every inode has a delayed extent list and all under delayed allocation
+ * blocks are added to the list as dealyed extents. Delayed extents in
+ * the list are sorted by logical block no.
+ *
+ * -- operations on a delayed extent list
+ * There are three operations on a delayed extent list: find next delayed
+ * adding a space(a range of blocks) to a list and removing a space from
+ * a list.
+ *
+ * -- race on a delayed extent list
+ * Delayed extent list is protected inode->i_data_sem like extent tree.
+ *
+ *
+ * ==========================================================================
+ * 3. performance analysis
+ * -- overhead
+ * 1. Apart from operations on a delayed extent list, we need to
+ * down_write(inode->i_data_sem) in delayed write path to maintain delayed
+ * extent list, this can have impact on parallel read-write and write-write
+ *
+ * 2. There is a cache extent for write access, so if writes are not very
+ * random, adding space operaions are in O(1) time.
+ *
+ * -- gain
+ * 3. Code is much simpler, more readable, more maintainable and
+ * more efficient.
+ *
+ * 4. TODO
+ * -- cache_de in de_find_extent
+ * de_find_extent is used heavily by bgalloc, so de_find_extent should use
+ * cache_de.
+ *
+ * ==========================================================================
+ * 4. More sophiscated data structure like tree ?
+ * Does we need rb tree instead?
+ *
+ */
+static struct kmem_cache *ext4_de_cachep;
+
+int __init ext4_init_de(void)
+{
+ ext4_de_cachep = KMEM_CACHE(ext4_delayed_extent,
+ SLAB_RECLAIM_ACCOUNT);
+ if (ext4_de_cachep == NULL)
+ return -ENOMEM;
+ return 0;
+}
+
+void ext4_exit_de(void)
+{
+ kmem_cache_destroy(ext4_de_cachep);
+}
+
+#ifdef DE_DEBUG
+void ext4_de_print_list(struct inode *inode)
+{
+ struct ext4_delayed_info *delayed_info;
+ struct list_head *cur;
+
+ printk(KERN_DEBUG "delayed extents for inode %lu:", inode->i_ino);
+ delayed_info = &EXT4_I(inode)->i_delayed_info;
+ list_for_each(cur, &delayed_info->de_list) {
+ struct ext4_delayed_extent *de;
+ de = list_entry(cur, struct ext4_delayed_extent, de_list);
+ printk(KERN_DEBUG " [%u, %u)", de->de_start, de->de_end);
+ }
+ printk(KERN_DEBUG "\n");
+}
+#else
+#define ext4_de_print_list(inode)
+#endif
+
+#define list_tail(list, type, member) \
+ list_entry((list)->prev, type, member)
+
+#define list_next(head, elm, member) \
+ ((elm)->member.next != head ? \
+ list_entry((elm)->member.next, typeof(*elm), member) : NULL)
+
+#define list_prev(head, elm, member) \
+ ((elm)->member.prev != head ? \
+ list_entry((elm)->member.prev, typeof(*elm), member) : NULL)
+
+/*
+ * ext4_de_first_extent_since: find the 1st delayed extent covering @start
+ * if it exists, otherwise, the next extent after @start.
+ *
+ * @inode: the inode which owns delayed extents
+ * @start: logic block
+ *
+ * Return next delayed block after the found extent. Delayed extent is returned
+ * the extent is returned via @de.
+ */
+ext4_lblk_t ext4_de_find_extent(struct inode *inode,
+ struct ext4_delayed_extent *de)
+{
+ struct ext4_delayed_info *delayed_info;
+ struct list_head *cur;
+ struct ext4_delayed_extent *de1;
+
+ de->de_end = 0;
+ delayed_info = &EXT4_I(inode)->i_delayed_info;
+ if (list_empty(&delayed_info->de_list))
+ return EXT_MAX_BLOCKS;
+
+ de1 = list_tail(&delayed_info->de_list, struct ext4_delayed_extent,
+ de_list);
+ if (de->de_start >= de1->de_end)
+ return EXT_MAX_BLOCKS;
+
+ list_for_each(cur, &delayed_info->de_list) {
+ de1 = list_entry(cur, struct ext4_delayed_extent, de_list);
+ if (de->de_end != 0)
+ return de1->de_start;
+
+ if (de1->de_end <= de->de_start)
+ /* @de is followed by @*start and they are
+ * not contiguous.
+ * |------------|.........|
+ * de *start
+ */
+ continue;
+ de->de_start = de1->de_start;
+ de->de_end = de1->de_end;
+ }
+
+ BUG_ON(de->de_end == 0);
+ return EXT_MAX_BLOCKS;
+}
+
+static struct ext4_delayed_extent *
+ext4_de_add_entry(struct list_head *head, ext4_lblk_t start, ext4_lblk_t end)
+{
+ struct ext4_delayed_extent *de;
+ de = kmem_cache_alloc(ext4_de_cachep, GFP_NOFS);
+ if (de == NULL)
+ return NULL;
+ de->de_start = start;
+ de->de_end = end;
+ INIT_LIST_HEAD(&de->de_list);
+ list_add(&de->de_list, head);
+ return de;
+}
+
+static void ext4_de_remove_entry(struct inode *inode,
+ struct ext4_delayed_extent *de)
+{
+ struct ext4_delayed_info *delayed_info = &EXT4_I(inode)->i_delayed_info;
+
+ list_del(&de->de_list);
+ kmem_cache_free(ext4_de_cachep, de);
+ if (delayed_info->cache_de == de)
+ delayed_info->cache_de = NULL;
+}
+
+static void ext4_de_try_to_merge_left(struct inode *inode,
+ struct ext4_delayed_extent *de)
+{
+ struct ext4_delayed_extent *de1;
+
+ de1 = list_prev(&EXT4_I(inode)->i_delayed_info.de_list, de, de_list);
+ if (de1 && de1->de_end == de->de_start) {
+ de->de_start = de1->de_start;
+ ext4_de_remove_entry(inode, de1);
+ }
+}
+
+static void ext4_de_try_to_merge_right(struct inode *inode,
+ struct ext4_delayed_extent *de)
+{
+ struct ext4_delayed_extent *de1;
+
+ de1 = list_next(&EXT4_I(inode)->i_delayed_info.de_list, de, de_list);
+ if (de1 && de1->de_start == de->de_end) {
+ de->de_end = de1->de_end;
+ ext4_de_remove_entry(inode, de1);
+ }
+}
+
+/*
+ * ext4_de_add_space() adds a space to a delayed extent list.
+ * Caller hold inode->i_data_sem.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_de_add_space(struct inode *inode, ext4_lblk_t start, ext4_lblk_t len)
+{
+ ext4_lblk_t end = start + len;
+ struct ext4_delayed_info *delayed_info;
+ struct ext4_delayed_extent *de;
+ struct list_head *cur;
+
+ /* ext4_de_add_space is called with len = 1. */
+ BUG_ON(len != 1);
+ delayed_info = &EXT4_I(inode)->i_delayed_info;
+ de = delayed_info->cache_de;
+ de_debug("add [%u, %u) to delayed extent list of inode %lu\n",
+ start, end, inode->i_ino);
+
+ if (de && de->de_end == start) {
+ de->de_end = end;
+ ext4_de_try_to_merge_right(inode, de);
+ goto out;
+ } else if (de && de->de_start == end) {
+ de->de_start = start;
+ ext4_de_try_to_merge_left(inode, de);
+ goto out;
+ } else if (de && de->de_start <= start && de->de_end >= end)
+ goto out;
+
+ /* We traverse the list backwards. */
+ list_for_each_prev(cur, &delayed_info->de_list) {
+ de = list_entry(cur, struct ext4_delayed_extent, de_list);
+ if (de->de_start > end)
+ /* de folllows [start, end) and they are
+ * not contiguous.
+ * |------------|----|-----------|
+ * [start, end) de
+ */
+ continue;
+
+ if (de->de_end == start) {
+ /* de is followed by [start, end) and they are
+ * contiguous.
+ * |-----------|---------------|
+ * de [start, end)
+ */
+ de->de_end = end;
+ goto out;
+ } else if (de->de_start == end) {
+ /* de follows [start, end) and they are contiguous.
+ * |-------------|------------|
+ * [start, end) de
+ */
+ de->de_start = start;
+ ext4_de_try_to_merge_left(inode, de);
+ goto out;
+ } else if (start >= de->de_start && end <= de->de_end)
+ goto out;
+
+ break;
+ }
+
+ de = ext4_de_add_entry(cur, start, end);
+ if (!de)
+ return -ENOMEM;
+out:
+ delayed_info->cache_de = de;
+ ext4_de_print_list(inode);
+
+ return 0;
+}
+
+/*
+ * ext4_de_remove_space() removes a space from a delayed extent list.
+ * Caller hold inode->i_data_sem.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_de_remove_space(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t len)
+{
+ ext4_lblk_t end = start + len;
+ struct ext4_delayed_info *delayed_info;
+ struct list_head *cur;
+
+ delayed_info = &EXT4_I(inode)->i_delayed_info;
+
+ de_debug("remove [%u, %u) from delayed extent list of inode %lu\n",
+ start, end, inode->i_ino);
+
+ list_for_each(cur, &delayed_info->de_list) {
+ struct ext4_delayed_extent *de, *new_de;
+ new_de = NULL;
+ de = list_entry(cur, struct ext4_delayed_extent, de_list);
+ if (de->de_end <= start)
+ /* de resides before [start, end)
+ * |---------|...|-----------|
+ * de [start, end)
+ */
+ continue;
+
+ if (de->de_start >= end)
+ /* de resides after [start, end)
+ * |---------|...|-----------|
+ * [start,end) de
+ */
+ break;
+
+ /* de and [start, end) intersect */
+ if (de->de_start < start && de->de_end <= end) {
+ /*
+ * |-----------------|--------|.........|
+ * de->de_start start de->de_end end
+ */
+ de->de_end = start;
+ start = de->de_end;
+ } else if ((de->de_start >= start && de->de_end <= end)) {
+ /*
+ * |............|--------|.........|
+ * start de->start de->end end
+ */
+ start = de->de_end;
+ cur = de->de_list.prev;
+ ext4_de_remove_entry(inode, de);
+ } else if (de->de_start < start && de->de_end > end) {
+ /*
+ * |------------|--------|---------|
+ * de->start start end de->de_end
+ */
+ if (!ext4_de_add_entry(&de->de_list, end, de->de_end))
+ return -ENOMEM;
+ de->de_end = start;
+ break;
+ } else if (de->de_start >= start && de->de_end > end) {
+ /*
+ * |.............|---------|--------|
+ * start de->de_start end de->de_end
+ */
+ de->de_start = end;
+ break;
+ } else
+ BUG();
+ }
+
+ ext4_de_print_list(inode);
+ return 0;
+}
diff --git a/fs/ext4/delayed_extents.h b/fs/ext4/delayed_extents.h
index b7baaef..dd3bdb5 100644
--- a/fs/ext4/delayed_extents.h
+++ b/fs/ext4/delayed_extents.h
@@ -8,6 +8,21 @@
#ifndef _EXT4_DELAYED_EXTENTS_H
#define _EXT4_DELAYED_EXTENTS_H
+/*
+ * Turn on DE_DEBUG to get lots of info about delayed extents operations.
+ */
+#define DE_DEBUG__
+#ifdef DE_DEBUG
+#define de_debug(f, a...) \
+ do { \
+ printk(KERN_DEBUG "EXT4-fs delayed-extents (%s, %d): " \
+ "%s:", __FILE__, __LINE__, __func__); \
+ printk(KERN_DEBUG f, ## a); \
+ } while (0)
+#else
+#define de_debug(f, a...)
+#endif
+
struct ext4_delayed_extent {
struct list_head de_list;
ext4_lblk_t de_start; /* first block extent covers */
@@ -19,4 +34,14 @@ struct ext4_delayed_info {
struct ext4_delayed_extent *cache_de; /* recently accessed extent */
};
+extern int __init ext4_init_de(void);
+extern void ext4_exit_de(void);
+
+extern int ext4_de_add_space(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t len);
+extern int ext4_de_remove_space(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t len);
+extern ext4_lblk_t ext4_de_find_extent(struct inode *inode,
+ struct ext4_delayed_extent *de);
+
#endif /* _EXT4_DELAYED_EXTENTS_H */
--
1.7.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 4/5] ext4: let ext4 maintian delayed extent lists
2011-08-30 9:00 [RFC] implementation of delayed extent list Yongqiang Yang
` (2 preceding siblings ...)
2011-08-30 9:00 ` [RFC PATCH 3/5] ext4: add operations on delayed extent tree Yongqiang Yang
@ 2011-08-30 9:00 ` Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 5/5] ext4: reimplement fiemap on delayed extent list Yongqiang Yang
4 siblings, 0 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-08-30 9:00 UTC (permalink / raw)
To: linux-ext4; +Cc: achender, adityakali, tytso, Yongqiang Yang
This patch let ext4 maintain delayed extent lists.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
fs/ext4/ext4.h | 1 +
fs/ext4/extents.c | 2 ++
fs/ext4/indirect.c | 3 +++
fs/ext4/inode.c | 28 ++++++++++++++++++++++++++--
fs/ext4/super.c | 12 +++++++++++-
5 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 6c5bfd6..c16b6e9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -518,6 +518,7 @@ struct ext4_new_group_data {
#define EXT4_GET_BLOCKS_PUNCH_OUT_EXT 0x0020
/* Don't normalize allocation size (used for fallocate) */
#define EXT4_GET_BLOCKS_NO_NORMALIZE 0x0040
+#define EXT4_GET_BLOCKS_DEALLOC 0x0080
/*
* Flags used by ext4_free_blocks
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 475ddf5..c9ad622 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3688,6 +3688,8 @@ void ext4_ext_truncate(struct inode *inode)
last_block = (inode->i_size + sb->s_blocksize - 1)
>> EXT4_BLOCK_SIZE_BITS(sb);
+ err = ext4_de_remove_space(inode, last_block,
+ EXT_MAX_BLOCKS - last_block);
err = ext4_ext_remove_space(inode, last_block);
/* In a multi-transaction truncate, we only make the final
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 0962642..25cdb5b 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -22,6 +22,7 @@
#include <linux/module.h>
#include "ext4_jbd2.h"
+#include "ext4_extents.h"
#include "truncate.h"
#include <trace/events/ext4.h>
@@ -1383,6 +1384,8 @@ void ext4_ind_truncate(struct inode *inode)
down_write(&ei->i_data_sem);
ext4_discard_preallocations(inode);
+ ext4_de_remove_space(inode, last_block,
+ EXT_MAX_BLOCKS - last_block);
/*
* The orphan list entry will now protect us from any crash which
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c3ce754..bcf5257 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -445,7 +445,15 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
up_read((&EXT4_I(inode)->i_data_sem));
if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
- int ret = check_block_validity(inode, map);
+ int ret;
+ if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
+ /* delayed alloc may be allocated by fallocate,
+ * we need to handle delayed extent here.
+ */
+ down_write((&EXT4_I(inode)->i_data_sem));
+ goto delayed_mapped;
+ }
+ ret = check_block_validity(inode, map);
if (ret != 0)
return ret;
}
@@ -520,8 +528,18 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE))
ext4_da_update_reserve_space(inode, retval, 1);
}
- if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
+ if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
ext4_clear_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);
+delayed_mapped:
+ if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
+ int ret;
+ /* delayed allocation blocks has been allocated */
+ ret = ext4_de_remove_space(inode, map->m_lblk,
+ map->m_len);
+ if (ret < 0)
+ retval = ret;
+ }
+ }
up_write((&EXT4_I(inode)->i_data_sem));
if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
@@ -1633,6 +1651,12 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
/* not enough space to reserve */
return ret;
+ down_write((&EXT4_I(inode)->i_data_sem));
+ ret = ext4_de_add_space(inode, map.m_lblk, map.m_len);
+ up_write((&EXT4_I(inode)->i_data_sem));
+ if (ret)
+ return ret;
+
map_bh(bh, inode->i_sb, invalid_block);
set_buffer_new(bh);
set_buffer_delay(bh);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f0b6fe6..e607823 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -49,6 +49,7 @@
#include "xattr.h"
#include "acl.h"
#include "mballoc.h"
+#include "ext4_extents.h"
#define CREATE_TRACE_POINTS
#include <trace/events/ext4.h>
@@ -968,6 +969,7 @@ void ext4_clear_inode(struct inode *inode)
end_writeback(inode);
dquot_drop(inode);
ext4_discard_preallocations(inode);
+ ext4_de_remove_space(inode, 0, EXT_MAX_BLOCKS);
if (EXT4_I(inode)->jinode) {
jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
EXT4_I(inode)->jinode);
@@ -4985,9 +4987,14 @@ static int __init ext4_init_fs(void)
init_waitqueue_head(&ext4__ioend_wq[i]);
}
- err = ext4_init_pageio();
+ err = ext4_init_de();
if (err)
return err;
+
+ err = ext4_init_pageio();
+ if (err)
+ goto out8;
+
err = ext4_init_system_zone();
if (err)
goto out7;
@@ -5039,6 +5046,9 @@ out6:
ext4_exit_system_zone();
out7:
ext4_exit_pageio();
+out8:
+ ext4_exit_de();
+
return err;
}
--
1.7.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 5/5] ext4: reimplement fiemap on delayed extent list
2011-08-30 9:00 [RFC] implementation of delayed extent list Yongqiang Yang
` (3 preceding siblings ...)
2011-08-30 9:00 ` [RFC PATCH 4/5] ext4: let ext4 maintian delayed extent lists Yongqiang Yang
@ 2011-08-30 9:00 ` Yongqiang Yang
4 siblings, 0 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-08-30 9:00 UTC (permalink / raw)
To: linux-ext4; +Cc: achender, adityakali, tytso, Yongqiang Yang
This patch reimplements fiemap on delayed extent list.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
fs/ext4/extents.c | 186 +++++++----------------------------------------------
1 files changed, 23 insertions(+), 163 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c9ad622..1c7ded8 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3909,193 +3909,53 @@ static int ext4_ext_fiemap_cb(struct inode *inode, ext4_lblk_t next,
struct ext4_ext_cache *newex, struct ext4_extent *ex,
void *data)
{
+ struct ext4_delayed_extent de;
__u64 logical;
__u64 physical;
__u64 length;
__u32 flags = 0;
+ ext4_lblk_t next_del;
int ret = 0;
struct fiemap_extent_info *fieinfo = data;
unsigned char blksize_bits;
- blksize_bits = inode->i_sb->s_blocksize_bits;
- logical = (__u64)newex->ec_block << blksize_bits;
+ de.de_start = newex->ec_block;
+ down_read(&EXT4_I(inode)->i_data_sem);
+ next_del = ext4_de_find_extent(inode, &de);
+ up_read(&EXT4_I(inode)->i_data_sem);
+ next = min(next_del, next);
if (newex->ec_start == 0) {
/*
* No extent in extent-tree contains block @newex->ec_start,
* then the block may stay in 1)a hole or 2)delayed-extent.
- *
- * Holes or delayed-extents are processed as follows.
- * 1. lookup dirty pages with specified range in pagecache.
- * If no page is got, then there is no delayed-extent and
- * return with EXT_CONTINUE.
- * 2. find the 1st mapped buffer,
- * 3. check if the mapped buffer is both in the request range
- * and a delayed buffer. If not, there is no delayed-extent,
- * then return.
- * 4. a delayed-extent is found, the extent will be collected.
*/
- ext4_lblk_t end = 0;
- pgoff_t last_offset;
- pgoff_t offset;
- pgoff_t index;
- pgoff_t start_index = 0;
- struct page **pages = NULL;
- struct buffer_head *bh = NULL;
- struct buffer_head *head = NULL;
- unsigned int nr_pages = PAGE_SIZE / sizeof(struct page *);
-
- pages = kmalloc(PAGE_SIZE, GFP_KERNEL);
- if (pages == NULL)
- return -ENOMEM;
-
- offset = logical >> PAGE_SHIFT;
-repeat:
- last_offset = offset;
- head = NULL;
- ret = find_get_pages_tag(inode->i_mapping, &offset,
- PAGECACHE_TAG_DIRTY, nr_pages, pages);
-
- if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
- /* First time, try to find a mapped buffer. */
- if (ret == 0) {
-out:
- for (index = 0; index < ret; index++)
- page_cache_release(pages[index]);
- /* just a hole. */
- kfree(pages);
- return EXT_CONTINUE;
- }
- index = 0;
-
-next_page:
- /* Try to find the 1st mapped buffer. */
- end = ((__u64)pages[index]->index << PAGE_SHIFT) >>
- blksize_bits;
- if (!page_has_buffers(pages[index]))
- goto out;
- head = page_buffers(pages[index]);
- if (!head)
- goto out;
-
- index++;
- bh = head;
- do {
- if (end >= newex->ec_block +
- newex->ec_len)
- /* The buffer is out of
- * the request range.
- */
- goto out;
-
- if (buffer_mapped(bh) &&
- end >= newex->ec_block) {
- start_index = index - 1;
- /* get the 1st mapped buffer. */
- goto found_mapped_buffer;
- }
-
- bh = bh->b_this_page;
- end++;
- } while (bh != head);
-
- /* No mapped buffer in the range found in this page,
- * We need to look up next page.
- */
- if (index >= ret) {
- /* There is no page left, but we need to limit
- * newex->ec_len.
- */
- newex->ec_len = end - newex->ec_block;
- goto out;
- }
- goto next_page;
- } else {
- /*Find contiguous delayed buffers. */
- if (ret > 0 && pages[0]->index == last_offset)
- head = page_buffers(pages[0]);
- bh = head;
- index = 1;
- start_index = 0;
- }
-
-found_mapped_buffer:
- if (bh != NULL && buffer_delay(bh)) {
- /* 1st or contiguous delayed buffer found. */
- if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
- /*
- * 1st delayed buffer found, record
- * the start of extent.
- */
- flags |= FIEMAP_EXTENT_DELALLOC;
- newex->ec_block = end;
- logical = (__u64)end << blksize_bits;
- }
- /* Find contiguous delayed buffers. */
- do {
- if (!buffer_delay(bh))
- goto found_delayed_extent;
- bh = bh->b_this_page;
- end++;
- } while (bh != head);
-
- for (; index < ret; index++) {
- if (!page_has_buffers(pages[index])) {
- bh = NULL;
- break;
- }
- head = page_buffers(pages[index]);
- if (!head) {
- bh = NULL;
- break;
- }
-
- if (pages[index]->index !=
- pages[start_index]->index + index
- - start_index) {
- /* Blocks are not contiguous. */
- bh = NULL;
- break;
- }
- bh = head;
- do {
- if (!buffer_delay(bh))
- /* Delayed-extent ends. */
- goto found_delayed_extent;
- bh = bh->b_this_page;
- end++;
- } while (bh != head);
- }
- } else if (!(flags & FIEMAP_EXTENT_DELALLOC))
- /* a hole found. */
- goto out;
-
-found_delayed_extent:
- newex->ec_len = min(end - newex->ec_block,
- (ext4_lblk_t)EXT_INIT_MAX_LEN);
- if (ret == nr_pages && bh != NULL &&
- newex->ec_len < EXT_INIT_MAX_LEN &&
- buffer_delay(bh)) {
- /* Have not collected an extent and continue. */
- for (index = 0; index < ret; index++)
- page_cache_release(pages[index]);
- goto repeat;
+ if (de.de_end == 0)
+ /* A hole found. */
+ return EXT_CONTINUE;
+
+ if (de.de_start > newex->ec_block) {
+ /* A hole found. */
+ newex->ec_len = min(de.de_start - newex->ec_block,
+ newex->ec_len);
+ return EXT_CONTINUE;
}
- for (index = 0; index < ret; index++)
- page_cache_release(pages[index]);
- kfree(pages);
+ flags |= FIEMAP_EXTENT_DELALLOC;
+ newex->ec_len = de.de_end - newex->ec_block;
}
- physical = (__u64)newex->ec_start << blksize_bits;
- length = (__u64)newex->ec_len << blksize_bits;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/5] ext4: add operations on delayed extent tree
2011-08-30 9:00 ` [RFC PATCH 3/5] ext4: add operations on delayed extent tree Yongqiang Yang
@ 2011-09-08 18:58 ` Jan Kara
2011-09-09 1:14 ` Yongqiang Yang
2011-09-09 2:46 ` Yongqiang Yang
2011-09-09 9:32 ` Lukas Czerner
1 sibling, 2 replies; 10+ messages in thread
From: Jan Kara @ 2011-09-08 18:58 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: linux-ext4, achender, adityakali, tytso
On Tue 30-08-11 17:00:19, Yongqiang Yang wrote:
> This patch adds operations on a delayed extent tree.
>
> Signed-off-by; Yongqiang Yang <xiaoqiangnk@gmail.com>
>
> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
I don't have time to do a full review of the series now but I definitely
think you should use RB tree to track extents. For big files written in
random manner we will just burn lots of CPU cycles needlessly...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/5] ext4: add operations on delayed extent tree
2011-09-08 18:58 ` Jan Kara
@ 2011-09-09 1:14 ` Yongqiang Yang
2011-09-09 2:46 ` Yongqiang Yang
1 sibling, 0 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-09-09 1:14 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, achender, adityakali, tytso
On Fri, Sep 9, 2011 at 2:58 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 30-08-11 17:00:19, Yongqiang Yang wrote:
>> This patch adds operations on a delayed extent tree.
>>
>> Signed-off-by; Yongqiang Yang <xiaoqiangnk@gmail.com>
>>
>> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
> I don't have time to do a full review of the series now but I definitely
> think you should use RB tree to track extents. For big files written in
> random manner we will just burn lots of CPU cycles needlessly...
I planned to use RB instead, I sent out this series to collect feedbacks.
Thank you.
Yongqiang.
>
> Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
>
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/5] ext4: add operations on delayed extent tree
2011-09-08 18:58 ` Jan Kara
2011-09-09 1:14 ` Yongqiang Yang
@ 2011-09-09 2:46 ` Yongqiang Yang
1 sibling, 0 replies; 10+ messages in thread
From: Yongqiang Yang @ 2011-09-09 2:46 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, achender, adityakali, tytso
On Fri, Sep 9, 2011 at 2:58 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 30-08-11 17:00:19, Yongqiang Yang wrote:
>> This patch adds operations on a delayed extent tree.
>>
>> Signed-off-by; Yongqiang Yang <xiaoqiangnk@gmail.com>
>>
>> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
> I don't have time to do a full review of the series now but I definitely
> think you should use RB tree to track extents. For big files written in
> random manner we will just burn lots of CPU cycles needlessly...
BTW: Do we need to use RB tree for preallocations as well? I
suspects that preallocations has same problem with delayed extent
list.
Yongqiang.
>
> Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
>
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/5] ext4: add operations on delayed extent tree
2011-08-30 9:00 ` [RFC PATCH 3/5] ext4: add operations on delayed extent tree Yongqiang Yang
2011-09-08 18:58 ` Jan Kara
@ 2011-09-09 9:32 ` Lukas Czerner
1 sibling, 0 replies; 10+ messages in thread
From: Lukas Czerner @ 2011-09-09 9:32 UTC (permalink / raw)
To: Yongqiang Yang; +Cc: linux-ext4, achender, adityakali, tytso
On Tue, 30 Aug 2011, Yongqiang Yang wrote:
> This patch adds operations on a delayed extent tree.
>
> Signed-off-by; Yongqiang Yang <xiaoqiangnk@gmail.com>
>
> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
> ---
> fs/ext4/Makefile | 2 +-
> fs/ext4/delayed_extents.c | 390 +++++++++++++++++++++++++++++++++++++++++++++
> fs/ext4/delayed_extents.h | 25 +++
> 3 files changed, 416 insertions(+), 1 deletions(-)
> create mode 100644 fs/ext4/delayed_extents.c
>
> diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
> index 56fd8f86..ee16ad3 100644
> --- a/fs/ext4/Makefile
> +++ b/fs/ext4/Makefile
> @@ -7,7 +7,7 @@ obj-$(CONFIG_EXT4_FS) += ext4.o
> ext4-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o page-io.o \
> ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \
> ext4_jbd2.o migrate.o mballoc.o block_validity.o move_extent.o \
> - mmp.o indirect.o
> + mmp.o indirect.o delayed_extents.o
>
> ext4-$(CONFIG_EXT4_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o
> ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
> diff --git a/fs/ext4/delayed_extents.c b/fs/ext4/delayed_extents.c
> new file mode 100644
> index 0000000..6275b19
> --- /dev/null
> +++ b/fs/ext4/delayed_extents.c
> @@ -0,0 +1,390 @@
> +/*
> + * fs/ext4/delayed_extents.c
> + *
> + * Written by Yongqiang Yang <xiaoqiangnk@gmail.com>
> + *
> + * Ext4 delayed-extents-list core functions.
> + */
> +
> +#include "ext4.h"
> +#include "delayed_extents.h"
> +#include "ext4_extents.h"
> +
> +/*
> + * delayed extent implementation for ext4.
> + *
> + *
> + * ==========================================================================
> + * 1. Why delayed extent implementation ?
> + *
> + * Without delayed extent, ext4 identifies a delayed extent by looking up
> + * page cache, this has several deficiencies - complicated, buggy, and
> + * inefficient code.
> + *
> + * FIEMAP, SEEK_HOLE/DATA, bigalloc, punch hole and writeout all need to know if
> + * a block or a range of blocks are belonged to a delayed extent.
> + *
> + * Let us have a look at how they do without delayed extents implementation.
> + * -- FIEMAP
> + * FIEMAP looks up page cache to identify delayed allocations from holes.
> + *
> + * -- SEEK_HOLE/DATA
> + * SEEK_HOLE/DATA has the same problem as FIEMAP.
> + *
> + * -- bigalloc
> + * bigalloc looks up page cache to figure out if a block is already
> + * under delayed allocation or not to determine whether quota reserving
> + * is needed for the cluster.
> + *
> + * -- punch hole
> + * punch hole looks up page cache to identify a delayed extent.
> + *
> + * -- writeout
> + * Writeout looks up whole page cache to see if a buffer is mapped, If
> + * there are not very many delayed buffers, then it is time comsuming.
> + *
> + * With delayed extents implementation, FIEMAP, SEEK_HOLE/DATA, bigalloc and
> + * writeout can figure out if a block or a range of blocks is under delayed
> + * allocation(belonged to a delayed extent) or not by searching the delayed
> + * extent list.
> + *
> + *
> + * ==========================================================================
> + * 2. ext4 delayed extents impelmentation
> + *
> + * -- delayed extent
> + * A delayed extent is a range of blocks which are contiguous logically and
> + * under delayed allocation. Unlike extent in ext4, delayed extent in ext4
> + * is a in-memory struct, there is no corresponding on-disk data. There is
> + * no limit on length of delayed extent, so a delayed extent can contain as
> + * many blocks as they are contiguous logically.
> + *
> + * -- delayed extent list
> + * Every inode has a delayed extent list and all under delayed allocation
> + * blocks are added to the list as dealyed extents. Delayed extents in
> + * the list are sorted by logical block no.
> + *
> + * -- operations on a delayed extent list
> + * There are three operations on a delayed extent list: find next delayed
> + * adding a space(a range of blocks) to a list and removing a space from
> + * a list.
> + *
> + * -- race on a delayed extent list
> + * Delayed extent list is protected inode->i_data_sem like extent tree.
> + *
> + *
> + * ==========================================================================
> + * 3. performance analysis
> + * -- overhead
> + * 1. Apart from operations on a delayed extent list, we need to
> + * down_write(inode->i_data_sem) in delayed write path to maintain delayed
> + * extent list, this can have impact on parallel read-write and write-write
> + *
> + * 2. There is a cache extent for write access, so if writes are not very
> + * random, adding space operaions are in O(1) time.
> + *
> + * -- gain
> + * 3. Code is much simpler, more readable, more maintainable and
> + * more efficient.
> + *
> + * 4. TODO
> + * -- cache_de in de_find_extent
> + * de_find_extent is used heavily by bgalloc, so de_find_extent should use
> + * cache_de.
> + *
> + * ==========================================================================
> + * 4. More sophiscated data structure like tree ?
> + * Does we need rb tree instead?
> + *
> + */
> +static struct kmem_cache *ext4_de_cachep;
> +
> +int __init ext4_init_de(void)
> +{
> + ext4_de_cachep = KMEM_CACHE(ext4_delayed_extent,
> + SLAB_RECLAIM_ACCOUNT);
> + if (ext4_de_cachep == NULL)
> + return -ENOMEM;
> + return 0;
> +}
> +
> +void ext4_exit_de(void)
> +{
> + kmem_cache_destroy(ext4_de_cachep);
> +}
> +
> +#ifdef DE_DEBUG
> +void ext4_de_print_list(struct inode *inode)
> +{
> + struct ext4_delayed_info *delayed_info;
> + struct list_head *cur;
> +
> + printk(KERN_DEBUG "delayed extents for inode %lu:", inode->i_ino);
> + delayed_info = &EXT4_I(inode)->i_delayed_info;
> + list_for_each(cur, &delayed_info->de_list) {
> + struct ext4_delayed_extent *de;
> + de = list_entry(cur, struct ext4_delayed_extent, de_list);
> + printk(KERN_DEBUG " [%u, %u)", de->de_start, de->de_end);
> + }
> + printk(KERN_DEBUG "\n");
> +}
> +#else
> +#define ext4_de_print_list(inode)
> +#endif
> +
> +#define list_tail(list, type, member) \
> + list_entry((list)->prev, type, member)
> +
> +#define list_next(head, elm, member) \
> + ((elm)->member.next != head ? \
> + list_entry((elm)->member.next, typeof(*elm), member) : NULL)
> +
> +#define list_prev(head, elm, member) \
> + ((elm)->member.prev != head ? \
> + list_entry((elm)->member.prev, typeof(*elm), member) : NULL)
> +
> +/*
> + * ext4_de_first_extent_since: find the 1st delayed extent covering @start
> + * if it exists, otherwise, the next extent after @start.
> + *
> + * @inode: the inode which owns delayed extents
> + * @start: logic block
> + *
> + * Return next delayed block after the found extent. Delayed extent is returned
> + * the extent is returned via @de.
> + */
> +ext4_lblk_t ext4_de_find_extent(struct inode *inode,
> + struct ext4_delayed_extent *de)
> +{
> + struct ext4_delayed_info *delayed_info;
> + struct list_head *cur;
> + struct ext4_delayed_extent *de1;
> +
> + de->de_end = 0;
> + delayed_info = &EXT4_I(inode)->i_delayed_info;
> + if (list_empty(&delayed_info->de_list))
> + return EXT_MAX_BLOCKS;
> +
> + de1 = list_tail(&delayed_info->de_list, struct ext4_delayed_extent,
> + de_list);
> + if (de->de_start >= de1->de_end)
> + return EXT_MAX_BLOCKS;
> +
> + list_for_each(cur, &delayed_info->de_list) {
> + de1 = list_entry(cur, struct ext4_delayed_extent, de_list);
> + if (de->de_end != 0)
> + return de1->de_start;
> +
> + if (de1->de_end <= de->de_start)
> + /* @de is followed by @*start and they are
> + * not contiguous.
> + * |------------|.........|
> + * de *start
> + */
> + continue;
> + de->de_start = de1->de_start;
> + de->de_end = de1->de_end;
> + }
> +
> + BUG_ON(de->de_end == 0);
> + return EXT_MAX_BLOCKS;
> +}
> +
> +static struct ext4_delayed_extent *
> +ext4_de_add_entry(struct list_head *head, ext4_lblk_t start, ext4_lblk_t end)
> +{
> + struct ext4_delayed_extent *de;
> + de = kmem_cache_alloc(ext4_de_cachep, GFP_NOFS);
> + if (de == NULL)
> + return NULL;
> + de->de_start = start;
> + de->de_end = end;
> + INIT_LIST_HEAD(&de->de_list);
> + list_add(&de->de_list, head);
> + return de;
> +}
> +
> +static void ext4_de_remove_entry(struct inode *inode,
> + struct ext4_delayed_extent *de)
> +{
> + struct ext4_delayed_info *delayed_info = &EXT4_I(inode)->i_delayed_info;
> +
> + list_del(&de->de_list);
> + kmem_cache_free(ext4_de_cachep, de);
> + if (delayed_info->cache_de == de)
> + delayed_info->cache_de = NULL;
> +}
> +
> +static void ext4_de_try_to_merge_left(struct inode *inode,
> + struct ext4_delayed_extent *de)
> +{
> + struct ext4_delayed_extent *de1;
> +
> + de1 = list_prev(&EXT4_I(inode)->i_delayed_info.de_list, de, de_list);
> + if (de1 && de1->de_end == de->de_start) {
> + de->de_start = de1->de_start;
> + ext4_de_remove_entry(inode, de1);
> + }
> +}
> +
> +static void ext4_de_try_to_merge_right(struct inode *inode,
> + struct ext4_delayed_extent *de)
> +{
> + struct ext4_delayed_extent *de1;
> +
> + de1 = list_next(&EXT4_I(inode)->i_delayed_info.de_list, de, de_list);
> + if (de1 && de1->de_start == de->de_end) {
> + de->de_end = de1->de_end;
> + ext4_de_remove_entry(inode, de1);
> + }
> +}
> +
> +/*
> + * ext4_de_add_space() adds a space to a delayed extent list.
> + * Caller hold inode->i_data_sem.
> + *
> + * Return 0 on success, error code on failure.
> + */
> +int ext4_de_add_space(struct inode *inode, ext4_lblk_t start, ext4_lblk_t len)
> +{
> + ext4_lblk_t end = start + len;
In the first patch you are saying that:
ext4_lblk_t de_end; /* block at which extent ends */
which implies to me that the de_end is the last block of the extent. But
start+len certainly is not the last block, but rather last+1.
I am mentioning this because we have had similar problems in the past
where the documentation said that EXT_MAX_BLOCK was a maximum logical
block, but it was in fact used as a maximum count of blocks and that
caused problems. So I would like to be clear on this, what the "end"
actually really means to avoid confusion.
Also in ext4_extent we are using "start" and "len" approach, I can not
say which is better to use, but it might be better to be consistent in
ext4 code.
Thanks!
-Lukas
> + struct ext4_delayed_info *delayed_info;
> + struct ext4_delayed_extent *de;
> + struct list_head *cur;
> +
> + /* ext4_de_add_space is called with len = 1. */
> + BUG_ON(len != 1);
> + delayed_info = &EXT4_I(inode)->i_delayed_info;
> + de = delayed_info->cache_de;
> + de_debug("add [%u, %u) to delayed extent list of inode %lu\n",
> + start, end, inode->i_ino);
> +
> + if (de && de->de_end == start) {
> + de->de_end = end;
> + ext4_de_try_to_merge_right(inode, de);
> + goto out;
> + } else if (de && de->de_start == end) {
> + de->de_start = start;
> + ext4_de_try_to_merge_left(inode, de);
> + goto out;
> + } else if (de && de->de_start <= start && de->de_end >= end)
> + goto out;
> +
> + /* We traverse the list backwards. */
> + list_for_each_prev(cur, &delayed_info->de_list) {
> + de = list_entry(cur, struct ext4_delayed_extent, de_list);
> + if (de->de_start > end)
> + /* de folllows [start, end) and they are
> + * not contiguous.
> + * |------------|----|-----------|
> + * [start, end) de
> + */
> + continue;
> +
> + if (de->de_end == start) {
> + /* de is followed by [start, end) and they are
> + * contiguous.
> + * |-----------|---------------|
> + * de [start, end)
> + */
> + de->de_end = end;
> + goto out;
> + } else if (de->de_start == end) {
> + /* de follows [start, end) and they are contiguous.
> + * |-------------|------------|
> + * [start, end) de
> + */
> + de->de_start = start;
> + ext4_de_try_to_merge_left(inode, de);
> + goto out;
> + } else if (start >= de->de_start && end <= de->de_end)
> + goto out;
> +
> + break;
> + }
> +
> + de = ext4_de_add_entry(cur, start, end);
> + if (!de)
> + return -ENOMEM;
> +out:
> + delayed_info->cache_de = de;
> + ext4_de_print_list(inode);
> +
> + return 0;
> +}
> +
> +/*
> + * ext4_de_remove_space() removes a space from a delayed extent list.
> + * Caller hold inode->i_data_sem.
> + *
> + * Return 0 on success, error code on failure.
> + */
> +int ext4_de_remove_space(struct inode *inode, ext4_lblk_t start,
> + ext4_lblk_t len)
> +{
> + ext4_lblk_t end = start + len;
> + struct ext4_delayed_info *delayed_info;
> + struct list_head *cur;
> +
> + delayed_info = &EXT4_I(inode)->i_delayed_info;
> +
> + de_debug("remove [%u, %u) from delayed extent list of inode %lu\n",
> + start, end, inode->i_ino);
> +
> + list_for_each(cur, &delayed_info->de_list) {
> + struct ext4_delayed_extent *de, *new_de;
> + new_de = NULL;
> + de = list_entry(cur, struct ext4_delayed_extent, de_list);
> + if (de->de_end <= start)
> + /* de resides before [start, end)
> + * |---------|...|-----------|
> + * de [start, end)
> + */
> + continue;
> +
> + if (de->de_start >= end)
> + /* de resides after [start, end)
> + * |---------|...|-----------|
> + * [start,end) de
> + */
> + break;
> +
> + /* de and [start, end) intersect */
> + if (de->de_start < start && de->de_end <= end) {
> + /*
> + * |-----------------|--------|.........|
> + * de->de_start start de->de_end end
> + */
> + de->de_end = start;
> + start = de->de_end;
> + } else if ((de->de_start >= start && de->de_end <= end)) {
> + /*
> + * |............|--------|.........|
> + * start de->start de->end end
> + */
> + start = de->de_end;
> + cur = de->de_list.prev;
> + ext4_de_remove_entry(inode, de);
> + } else if (de->de_start < start && de->de_end > end) {
> + /*
> + * |------------|--------|---------|
> + * de->start start end de->de_end
> + */
> + if (!ext4_de_add_entry(&de->de_list, end, de->de_end))
> + return -ENOMEM;
> + de->de_end = start;
> + break;
> + } else if (de->de_start >= start && de->de_end > end) {
> + /*
> + * |.............|---------|--------|
> + * start de->de_start end de->de_end
> + */
> + de->de_start = end;
> + break;
> + } else
> + BUG();
> + }
> +
> + ext4_de_print_list(inode);
> + return 0;
> +}
> diff --git a/fs/ext4/delayed_extents.h b/fs/ext4/delayed_extents.h
> index b7baaef..dd3bdb5 100644
> --- a/fs/ext4/delayed_extents.h
> +++ b/fs/ext4/delayed_extents.h
> @@ -8,6 +8,21 @@
> #ifndef _EXT4_DELAYED_EXTENTS_H
> #define _EXT4_DELAYED_EXTENTS_H
>
> +/*
> + * Turn on DE_DEBUG to get lots of info about delayed extents operations.
> + */
> +#define DE_DEBUG__
> +#ifdef DE_DEBUG
> +#define de_debug(f, a...) \
> + do { \
> + printk(KERN_DEBUG "EXT4-fs delayed-extents (%s, %d): " \
> + "%s:", __FILE__, __LINE__, __func__); \
> + printk(KERN_DEBUG f, ## a); \
> + } while (0)
> +#else
> +#define de_debug(f, a...)
> +#endif
> +
> struct ext4_delayed_extent {
> struct list_head de_list;
> ext4_lblk_t de_start; /* first block extent covers */
> @@ -19,4 +34,14 @@ struct ext4_delayed_info {
> struct ext4_delayed_extent *cache_de; /* recently accessed extent */
> };
>
> +extern int __init ext4_init_de(void);
> +extern void ext4_exit_de(void);
> +
> +extern int ext4_de_add_space(struct inode *inode, ext4_lblk_t start,
> + ext4_lblk_t len);
> +extern int ext4_de_remove_space(struct inode *inode, ext4_lblk_t start,
> + ext4_lblk_t len);
> +extern ext4_lblk_t ext4_de_find_extent(struct inode *inode,
> + struct ext4_delayed_extent *de);
> +
> #endif /* _EXT4_DELAYED_EXTENTS_H */
>
--
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-09-10 20:33 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-30 9:00 [RFC] implementation of delayed extent list Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 1/5] ext4: add two structures delayed extent list nees Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 2/5] ext4: add a delayed extents list in inode Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 3/5] ext4: add operations on delayed extent tree Yongqiang Yang
2011-09-08 18:58 ` Jan Kara
2011-09-09 1:14 ` Yongqiang Yang
2011-09-09 2:46 ` Yongqiang Yang
2011-09-09 9:32 ` Lukas Czerner
2011-08-30 9:00 ` [RFC PATCH 4/5] ext4: let ext4 maintian delayed extent lists Yongqiang Yang
2011-08-30 9:00 ` [RFC PATCH 5/5] ext4: reimplement fiemap on delayed extent list Yongqiang Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).