* [PATCH v2 01/23] btrfs: qgroup: New function declaration for new reserve implement
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
@ 2015-10-09 2:11 ` Qu Wenruo
2015-10-09 2:11 ` [PATCH v2 02/23] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
` (22 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:11 UTC (permalink / raw)
To: linux-btrfs
Add new structures and functions for new qgroup reserve implement dirty
phase.
Which will focus on avoiding over-reserve as in that case, which means
for already reserved dirty space range, we won't reserve space again.
This patch adds the needed structure declaration and comments.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Fix some comment spell
---
fs/btrfs/btrfs_inode.h | 4 ++++
fs/btrfs/qgroup.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/qgroup.h | 3 +++
3 files changed, 65 insertions(+)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 0ef5cc1..6d799b8 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -24,6 +24,7 @@
#include "extent_io.h"
#include "ordered-data.h"
#include "delayed-inode.h"
+#include "qgroup.h"
/*
* ordered_data_close is set by truncate when a file that used
@@ -193,6 +194,9 @@ struct btrfs_inode {
struct timespec i_otime;
struct inode vfs_inode;
+
+ /* qgroup dirty map for data space reserve */
+ struct btrfs_qgroup_data_rsv_map *qgroup_rsv_map;
};
extern unsigned char btrfs_filetype_table[];
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e9ace09..607ace8 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -91,6 +91,64 @@ struct btrfs_qgroup {
u64 new_refcnt;
};
+/*
+ * Record one range of reserved space.
+ */
+struct data_rsv_range {
+ struct rb_node node;
+ u64 start;
+ u64 len;
+};
+
+/*
+ * Record per inode reserved range.
+ * This is mainly used to resolve reserved space leaking problem.
+ * One of the cause is the mismatch with reserve and free.
+ *
+ * New qgroup will handle reserve in two phase.
+ * 1) Dirty phase.
+ * Pages are just marked dirty, but not written to disk.
+ * 2) Flushed phase
+ * Pages are written to disk, but transaction is not committed yet.
+ *
+ * At Dirty phase, we only need to focus on avoiding over-reserve.
+ *
+ * The idea is like below.
+ * 1) Write [0,8K)
+ * 0 4K 8K 12K 16K
+ * |////////////|
+ * Reserve +8K, total reserved: 8K
+ *
+ * 2) Write [0,4K)
+ * 0 4K 8K 12K 16K
+ * |////////////|
+ * Reserve 0, total reserved 8K
+ *
+ * 3) Write [12K,16K)
+ * 0 4K 8K 12K 16K
+ * |////////////| |///////|
+ * Reserve +4K, total reserved 12K
+ *
+ * 4) Flush [0,8K)
+ * Can happen without commit transaction, like fallocate will trigger the
+ * write.
+ * 0 4K 8K 12K 16K
+ * |///////|
+ * Reserve 0, total reserved 12K
+ * As the extent is written to disk, not dirty any longer, the range get
+ * removed.
+ * But as its delayed_refs is not run, its reserved space will not be freed.
+ * And things continue to Flushed phase.
+ *
+ * By this method, we can avoid over-reserve, which will lead to reserved
+ * space leak.
+ */
+struct btrfs_qgroup_data_rsv_map {
+ struct rb_root root;
+ u64 reserved;
+ spinlock_t lock;
+};
+
static void btrfs_qgroup_update_old_refcnt(struct btrfs_qgroup *qg, u64 seq,
int mod)
{
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 6387dcf..2f863a4 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -33,6 +33,9 @@ struct btrfs_qgroup_extent_record {
struct ulist *old_roots;
};
+/* For per-inode dirty range reserve */
+struct btrfs_qgroup_data_rsv_map;
+
int btrfs_quota_enable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
int btrfs_quota_disable(struct btrfs_trans_handle *trans,
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 02/23] btrfs: qgroup: Implement data_rsv_map init/free functions
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
2015-10-09 2:11 ` [PATCH v2 01/23] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
@ 2015-10-09 2:11 ` Qu Wenruo
2015-10-09 2:15 ` [PATCH v2 03/23] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
` (21 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:11 UTC (permalink / raw)
To: linux-btrfs
New functions btrfs_qgroup_init/free_data_rsv_map() to init/free data
reserve map.
Data reserve map is used to mark which range already holds reserved
space, to avoid current reserved space leak.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Add reserved space leaking check at free_data_rsv_map time
---
fs/btrfs/btrfs_inode.h | 2 ++
fs/btrfs/inode.c | 10 ++++++
fs/btrfs/qgroup.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/qgroup.h | 3 ++
4 files changed, 99 insertions(+)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 6d799b8..c2da3a9 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -197,6 +197,8 @@ struct btrfs_inode {
/* qgroup dirty map for data space reserve */
struct btrfs_qgroup_data_rsv_map *qgroup_rsv_map;
+ /* lock to ensure rsv_map will only be initialized once */
+ spinlock_t qgroup_init_lock;
};
extern unsigned char btrfs_filetype_table[];
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b7e439b..79ad301 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8940,6 +8940,14 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
INIT_LIST_HEAD(&ei->delalloc_inodes);
RB_CLEAR_NODE(&ei->rb_node);
+ /*
+ * Init qgroup info to empty, as they will be initialized at write
+ * time.
+ * This behavior is needed for enable quota later case.
+ */
+ spin_lock_init(&ei->qgroup_init_lock);
+ ei->qgroup_rsv_map = NULL;
+
return inode;
}
@@ -8997,6 +9005,8 @@ void btrfs_destroy_inode(struct inode *inode)
btrfs_put_ordered_extent(ordered);
}
}
+ /* free and check data rsv map */
+ btrfs_qgroup_free_data_rsv_map(inode);
inode_tree_del(inode);
btrfs_drop_extent_cache(inode, 0, (u64)-1, 0);
free:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 607ace8..c275312 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2539,3 +2539,87 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
btrfs_queue_work(fs_info->qgroup_rescan_workers,
&fs_info->qgroup_rescan_work);
}
+
+/*
+ * Init data_rsv_map for a given inode.
+ *
+ * This is needed at write time as quota can be disabled and then enabled
+ */
+int btrfs_qgroup_init_data_rsv_map(struct inode *inode)
+{
+ struct btrfs_inode *binode = BTRFS_I(inode);
+ struct btrfs_root *root = binode->root;
+ struct btrfs_qgroup_data_rsv_map *dirty_map;
+
+ if (!root->fs_info->quota_enabled || !is_fstree(root->objectid))
+ return 0;
+
+ spin_lock(&binode->qgroup_init_lock);
+ /* Quick route for init */
+ if (likely(binode->qgroup_rsv_map))
+ goto out;
+ spin_unlock(&binode->qgroup_init_lock);
+
+ /*
+ * Slow allocation route
+ *
+ * TODO: Use kmem_cache to speedup allocation
+ */
+ dirty_map = kmalloc(sizeof(*dirty_map), GFP_NOFS);
+ if (!dirty_map)
+ return -ENOMEM;
+
+ dirty_map->reserved = 0;
+ dirty_map->root = RB_ROOT;
+ spin_lock_init(&dirty_map->lock);
+
+ /* Lock again to ensure no one has already init it before */
+ spin_lock(&binode->qgroup_init_lock);
+ if (binode->qgroup_rsv_map) {
+ spin_unlock(&binode->qgroup_init_lock);
+ kfree(dirty_map);
+ return 0;
+ }
+ binode->qgroup_rsv_map = dirty_map;
+out:
+ spin_unlock(&binode->qgroup_init_lock);
+ return 0;
+}
+
+void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
+{
+ struct btrfs_inode *binode = BTRFS_I(inode);
+ struct btrfs_root *root = binode->root;
+ struct btrfs_qgroup_data_rsv_map *dirty_map = binode->qgroup_rsv_map;
+ struct rb_node *node;
+
+ /*
+ * this function is called at inode destroy routine, so no concurrency
+ * will happen, no need to get the lock.
+ */
+ if (!dirty_map)
+ return;
+
+ /* insanity check */
+ WARN_ON(!is_fstree(root->objectid));
+
+ /* Reserve map should be empty, or we are leaking */
+ WARN_ON(dirty_map->reserved);
+
+ btrfs_qgroup_free(root, dirty_map->reserved);
+ spin_lock(&dirty_map->lock);
+ while ((node = rb_first(&dirty_map->root)) != NULL) {
+ struct data_rsv_range *range;
+
+ range = rb_entry(node, struct data_rsv_range, node);
+ btrfs_warn(root->fs_info,
+ "leaking reserved range, root: %llu, ino: %lu, start: %llu, len: %llu\n",
+ root->objectid, inode->i_ino, range->start,
+ range->len);
+ rb_erase(node, &dirty_map->root);
+ kfree(range);
+ }
+ spin_unlock(&dirty_map->lock);
+ kfree(dirty_map);
+ binode->qgroup_rsv_map = NULL;
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 2f863a4..c87b7dc 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -84,4 +84,7 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
u64 rfer, u64 excl);
#endif
+/* for qgroup reserve */
+int btrfs_qgroup_init_data_rsv_map(struct inode *inode);
+void btrfs_qgroup_free_data_rsv_map(struct inode *inode);
#endif /* __BTRFS_QGROUP__ */
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 03/23] btrfs: qgroup: Introduce new function to search most left reserve range
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
2015-10-09 2:11 ` [PATCH v2 01/23] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
2015-10-09 2:11 ` [PATCH v2 02/23] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
@ 2015-10-09 2:15 ` Qu Wenruo
2015-10-09 2:15 ` [PATCH v2 04/23] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
` (20 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:15 UTC (permalink / raw)
To: linux-btrfs
Introduce the new function to search the most left reserve range in a
reserve map.
It provides the basis for later reserve map implement.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
fs/btrfs/qgroup.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c275312..c771029 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2541,6 +2541,42 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
}
/*
+ * Return the nearest left range of given start
+ * No ensure about the range will cover start.
+ */
+static struct data_rsv_range *
+find_reserve_range(struct btrfs_qgroup_data_rsv_map *map, u64 start)
+{
+ struct rb_node **p = &map->root.rb_node;
+ struct rb_node *parent = NULL;
+ struct rb_node *prev = NULL;
+ struct data_rsv_range *range = NULL;
+
+ while (*p) {
+ parent = *p;
+ range = rb_entry(parent, struct data_rsv_range, node);
+ if (range->start < start)
+ p = &(*p)->rb_right;
+ else if (range->start > start)
+ p = &(*p)->rb_left;
+ else
+ return range;
+ }
+
+ /* empty tree */
+ if (!parent)
+ return NULL;
+ if (range->start <= start)
+ return range;
+
+ prev = rb_prev(parent);
+ /* Already most left one */
+ if (!prev)
+ return range;
+ return rb_entry(prev, struct data_rsv_range, node);
+}
+
+/*
* Init data_rsv_map for a given inode.
*
* This is needed at write time as quota can be disabled and then enabled
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 04/23] btrfs: qgroup: Introduce function to insert non-overlap reserve range
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (2 preceding siblings ...)
2015-10-09 2:15 ` [PATCH v2 03/23] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
@ 2015-10-09 2:15 ` Qu Wenruo
2015-10-09 2:15 ` [PATCH v2 05/23] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
` (19 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:15 UTC (permalink / raw)
To: linux-btrfs
New function insert_data_ranges() will insert non-overlap reserve ranges
into reserve map.
It provides the basis for later qgroup reserve map implement.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Fix comment typo
---
fs/btrfs/qgroup.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 124 insertions(+)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c771029..b690b02 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2577,6 +2577,130 @@ find_reserve_range(struct btrfs_qgroup_data_rsv_map *map, u64 start)
}
/*
+ * Insert one data range
+ * [start,len) here won't overlap with each other.
+ *
+ * Return 0 if range is inserted and tmp is not used.
+ * Return > 0 if range is inserted and tmp is used.
+ * No catchable error case. Only possible error will cause BUG_ON() as
+ * that's logical error.
+ */
+static int insert_data_range(struct btrfs_qgroup_data_rsv_map *map,
+ struct data_rsv_range *tmp,
+ u64 start, u64 len)
+{
+ struct rb_node **p = &map->root.rb_node;
+ struct rb_node *parent = NULL;
+ struct rb_node *tmp_node = NULL;
+ struct data_rsv_range *range = NULL;
+ struct data_rsv_range *prev_range = NULL;
+ struct data_rsv_range *next_range = NULL;
+ int prev_merged = 0;
+ int next_merged = 0;
+ int ret = 0;
+
+ while (*p) {
+ parent = *p;
+ range = rb_entry(parent, struct data_rsv_range, node);
+ if (range->start < start)
+ p = &(*p)->rb_right;
+ else if (range->start > start)
+ p = &(*p)->rb_left;
+ else
+ BUG_ON(1);
+ }
+
+ /* Empty tree, goto isolated case */
+ if (!range)
+ goto insert_isolated;
+
+ /* get adjusted ranges */
+ if (range->start < start) {
+ prev_range = range;
+ tmp_node = rb_next(parent);
+ if (tmp)
+ next_range = rb_entry(tmp_node, struct data_rsv_range,
+ node);
+ } else {
+ next_range = range;
+ tmp_node = rb_prev(parent);
+ if (tmp)
+ prev_range = rb_entry(tmp_node, struct data_rsv_range,
+ node);
+ }
+
+ /* try to merge with previous and next ranges */
+ if (prev_range && prev_range->start + prev_range->len == start) {
+ prev_merged = 1;
+ prev_range->len += len;
+ }
+ if (next_range && start + len == next_range->start) {
+ next_merged = 1;
+
+ /*
+ * the range can be merged with adjusted two ranges into one,
+ * remove the tailing range.
+ */
+ if (prev_merged) {
+ prev_range->len += next_range->len;
+ rb_erase(&next_range->node, &map->root);
+ kfree(next_range);
+ } else {
+ next_range->start = start;
+ next_range->len += len;
+ }
+ }
+
+insert_isolated:
+ /* isolated case, need to insert range now */
+ if (!next_merged && !prev_merged) {
+ BUG_ON(!tmp);
+
+ tmp->start = start;
+ tmp->len = len;
+ rb_link_node(&tmp->node, parent, p);
+ rb_insert_color(&tmp->node, &map->root);
+ ret = 1;
+ }
+ return ret;
+}
+
+/*
+ * insert reserve range and merge them if possible
+ *
+ * Return 0 if all inserted and tmp not used
+ * Return > 0 if all inserted and tmp used
+ * No catchable error return value.
+ */
+static int insert_data_ranges(struct btrfs_qgroup_data_rsv_map *map,
+ struct data_rsv_range *tmp,
+ struct ulist *insert_list)
+{
+ struct ulist_node *unode;
+ struct ulist_iterator uiter;
+ int tmp_used = 0;
+ int ret = 0;
+
+ ULIST_ITER_INIT(&uiter);
+ while ((unode = ulist_next(insert_list, &uiter))) {
+ ret = insert_data_range(map, tmp, unode->val, unode->aux);
+
+ /*
+ * insert_data_range() won't return error return value,
+ * no need to hanle <0 case.
+ *
+ * Also tmp should be used at most one time, so clear it to
+ * NULL to cooperate with sanity check in insert_data_range().
+ */
+ if (ret > 0) {
+ tmp_used = 1;
+ tmp = NULL;
+ }
+ }
+ return tmp_used;
+}
+
+/*
* Init data_rsv_map for a given inode.
*
* This is needed at write time as quota can be disabled and then enabled
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 05/23] btrfs: qgroup: Introduce function to reserve data range per inode
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (3 preceding siblings ...)
2015-10-09 2:15 ` [PATCH v2 04/23] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
@ 2015-10-09 2:15 ` Qu Wenruo
2015-10-09 2:18 ` [PATCH v2 06/23] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
` (18 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:15 UTC (permalink / raw)
To: linux-btrfs
Introduce new function reserve_data_range().
This function will find non-overlap range and to insert it into reserve
map using previously introduced functions.
This provides the basis for later per inode reserve map implement.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Add needed parameter for later trace functions
---
fs/btrfs/qgroup.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index b690b02..3bdf28e 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2701,6 +2701,101 @@ static int insert_data_ranges(struct btrfs_qgroup_data_rsv_map *map,
}
/*
+ * Check qgroup limit and insert dirty range into reserve_map.
+ *
+ * Must be called with map->lock hold
+ */
+static int reserve_data_range(struct btrfs_root *root,
+ struct btrfs_qgroup_data_rsv_map *map,
+ struct data_rsv_range *tmp,
+ struct ulist *insert_list, u64 start, u64 len,
+ u64 *reserved)
+{
+ struct data_rsv_range *range;
+ u64 cur_start = 0;
+ u64 cur_len = 0;
+ u64 reserve = 0;
+ int ret = 0;
+
+ range = find_reserve_range(map, start);
+ /* empty tree, insert the whole range */
+ if (!range) {
+ reserve = len;
+ ret = ulist_add(insert_list, start, len, GFP_ATOMIC);
+ if (ret < 0)
+ return ret;
+ goto insert;
+ }
+
+ /* For case range is covering the leading part */
+ if (range->start <= start && range->start + range->len > start)
+ cur_start = range->start + range->len;
+ else
+ cur_start = start;
+
+ /*
+ * iterate until the end of the range.
+ * Like the following:
+ *
+ * |<--------desired---------------------->|
+ *|//1//| |////2//| |///3///| <- exists
+ * Then we will need to insert the following
+ * |\\\4\\\| |\\\5\\\| |\\\6\\\|
+ * And only add qgroup->reserved for rang 4,5,6.
+ */
+ while (cur_start < start + len) {
+ struct rb_node *next_node;
+ u64 next_start;
+
+ if (range->start + range->len <= cur_start) {
+ /*
+ * Move to next range if current range is before
+ * cur_start
+ * e.g range is 1, cur_start is the end of range 1.
+ */
+ next_node = rb_next(&range->node);
+ if (!next_node) {
+ /*
+ * no next range, fill the rest
+ * e.g range is 3, cur_start is end of range 3.
+ */
+ cur_len = start + len - cur_start;
+ next_start = start + len;
+ } else {
+ range = rb_entry(next_node,
+ struct data_rsv_range, node);
+ cur_len = min(range->start, start + len) -
+ cur_start;
+ next_start = range->start + range->len;
+ }
+ } else {
+ /*
+ * current range is already after cur_start
+ * e.g range is 2, cur_start is end of range 1.
+ */
+ cur_len = min(range->start, start + len) - cur_start;
+ next_start = range->start + range->len;
+ }
+ reserve += cur_len;
+ ret = ulist_add(insert_list, cur_start, cur_len, GFP_ATOMIC);
+ if (ret < 0)
+ return ret;
+
+ cur_start = next_start;
+ }
+insert:
+ ret = btrfs_qgroup_reserve(root, reserve);
+ if (ret < 0)
+ return ret;
+ /* ranges must be inserted after we are sure it has enough space */
+ ret = insert_data_ranges(map, tmp, insert_list);
+ map->reserved += reserve;
+ if (reserved)
+ *reserved = reserve;
+ return ret;
+}
+
+/*
* Init data_rsv_map for a given inode.
*
* This is needed at write time as quota can be disabled and then enabled
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 06/23] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (4 preceding siblings ...)
2015-10-09 2:15 ` [PATCH v2 05/23] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
@ 2015-10-09 2:18 ` Qu Wenruo
2015-10-09 2:18 ` [PATCH v2 07/23] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
` (17 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:18 UTC (permalink / raw)
To: linux-btrfs
This new function will do all the hard work to reserve precious space
for a write.
The overall work flow will be the following.
File A already has some dirty pages:
0 4K 8K 12K 16K
|///////| |///////|
And then, someone want to write some data into range [4K, 16K).
|<------desired-------->|
Unlike the old and wrong implement, which reserve 12K, this function
will only reserve space for newly dirty part:
|\\\\\\\| |\\\\\\\|
Which only takes 8K reserve space, as other part has already allocated
their own reserve space.
So the final reserve map will be:
|///////////////////////////////|
This provides the basis to resolve the long existing qgroup limit bug.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Add needed parameter for later trace functions
---
fs/btrfs/qgroup.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/qgroup.h | 1 +
2 files changed, 58 insertions(+)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 3bdf28e..e840f5c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2796,6 +2796,63 @@ insert:
}
/*
+ * Make sure the data space for [start, start + len) is reserved.
+ * It will either reserve new space from given qgroup or reuse the already
+ * reserved space.
+ *
+ * Return 0 for successful reserve.
+ * Return <0 for error.
+ *
+ * TODO: to handle nocow case, like NODATACOW or write into prealloc space
+ * along with other mixed case.
+ * Like write 2M, first 1M can be nocowed, but next 1M is on hole and need COW.
+ */
+int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
+{
+ struct btrfs_inode *binode = BTRFS_I(inode);
+ struct btrfs_root *root = binode->root;
+ struct btrfs_qgroup_data_rsv_map *reserve_map;
+ struct data_rsv_range *tmp = NULL;
+ struct ulist *insert_list;
+ int ret;
+
+ if (!root->fs_info->quota_enabled || !is_fstree(root->objectid) ||
+ len == 0)
+ return 0;
+
+ if (!binode->qgroup_rsv_map) {
+ ret = btrfs_qgroup_init_data_rsv_map(inode);
+ if (ret < 0)
+ return ret;
+ }
+ reserve_map = binode->qgroup_rsv_map;
+ insert_list = ulist_alloc(GFP_NOFS);
+ if (!insert_list)
+ return -ENOMEM;
+ tmp = kzalloc(sizeof(*tmp), GFP_NOFS);
+ if (!tmp) {
+ ulist_free(insert_list);
+ return -ENOMEM;
+ }
+
+ spin_lock(&reserve_map->lock);
+ ret = reserve_data_range(root, reserve_map, tmp, insert_list, start,
+ len, NULL);
+ /*
+ * For error and already exists case, free tmp memory.
+ * For tmp used case, set ret to 0, as some careless
+ * caller consider >0 as error.
+ */
+ if (ret <= 0)
+ kfree(tmp);
+ else
+ ret = 0;
+ spin_unlock(&reserve_map->lock);
+ ulist_free(insert_list);
+ return ret;
+}
+
+/*
* Init data_rsv_map for a given inode.
*
* This is needed at write time as quota can be disabled and then enabled
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index c87b7dc..366b853 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -87,4 +87,5 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
/* for qgroup reserve */
int btrfs_qgroup_init_data_rsv_map(struct inode *inode);
void btrfs_qgroup_free_data_rsv_map(struct inode *inode);
+int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
#endif /* __BTRFS_QGROUP__ */
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 07/23] btrfs: qgroup: Introduce function to release reserved range
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (5 preceding siblings ...)
2015-10-09 2:18 ` [PATCH v2 06/23] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
@ 2015-10-09 2:18 ` Qu Wenruo
2015-10-09 2:18 ` [PATCH v2 08/23] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
` (16 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:18 UTC (permalink / raw)
To: linux-btrfs
Introduce new function release_data_range() to release reserved ranges.
It will iterate through all existing ranges and remove/shrink them.
Note this function will not free reserved space, as the range can be
released in the following conditions:
1) The dirty range gets written to disk.
In this case, reserved range will be released but reserved bytes
will not be freed until the delayed_ref is run.
2) Truncate
In this case, dirty ranges will be released and reserved bytes will
also be freed.
So the new function won't free reserved space, but record them into
parameter if called needs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Add parameter for later trace functions
---
fs/btrfs/qgroup.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 133 insertions(+)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e840f5c..9934929 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2852,6 +2852,139 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
return ret;
}
+/* Small helper used in release_data_range() to update rsv map */
+static inline void __update_rsv(struct btrfs_qgroup_data_rsv_map *map,
+ u64 *reserved, u64 cur_rsv)
+{
+ if (WARN_ON(map->reserved < cur_rsv)) {
+ if (reserved)
+ *reserved += map->reserved;
+ map->reserved = 0;
+ } else {
+ if (reserved)
+ *reserved += cur_rsv;
+ map->reserved -= cur_rsv;
+ }
+}
+
+/*
+ * Release the range [start, start + len) from rsv map.
+ *
+ * The behavior should be much like reserve_data_range().
+ * @tmp: the allocated memory for case which need to split existing
+ * range into two.
+ * @reserved: the number of bytes that may need to free
+ * Return > 0 if 'tmp' memory is used and release range successfully
+ * Return 0 if 'tmp' memory is not used and release range successfully
+ * Return < 0 for error
+ */
+static int release_data_range(struct btrfs_qgroup_data_rsv_map *map,
+ struct data_rsv_range *tmp,
+ u64 start, u64 len, u64 *reserved)
+{
+ struct data_rsv_range *range;
+ u64 cur_rsv = 0;
+ int ret = 0;
+
+ range = find_reserve_range(map, start);
+ /* empty tree, just return */
+ if (!range)
+ return 0;
+ /*
+ * For split case
+ * |<----desired---->|
+ * |////////////////////////////////////////////|
+ * In this case, we need to insert one new range.
+ */
+ if (range->start < start && range->start + range->len > start + len) {
+ u64 new_start = start + len;
+ u64 new_len = range->start + range->len - start - len;
+
+ cur_rsv = len;
+ if (reserved)
+ *reserved += cur_rsv;
+ map->reserved -= cur_rsv;
+
+ range->len = start - range->start;
+ ret = insert_data_range(map, tmp, new_start, new_len);
+ WARN_ON(ret <= 0);
+ return 1;
+ }
+
+ /*
+ * Iterate until the end of the range and free release all
+ * reserved data from map.
+ * We iterate by existing range, as that will makes codes a
+ * little more clean.
+ *
+ * |<---------desired------------------------>|
+ * |//1//| |//2//| |//3//| |//4//|
+ */
+ while (range->start < start + len) {
+ struct rb_node *next = NULL;
+ int range_freed = 0;
+
+ /*
+ * |<---desired---->|
+ * |///////|
+ */
+ if (unlikely(range->start + range->len <= start))
+ goto next;
+
+ /*
+ * |<----desired---->|
+ * |///////|
+ */
+ if (range->start < start &&
+ range->start + range->len > start) {
+ cur_rsv = range->start + range->len - start;
+
+ range->len = start - range->start;
+ goto next;
+ }
+
+ /*
+ * |<--desired-->|
+ * |/////|
+ * Including same start/end case, so other case don't need
+ * to check start/end equal case and don't need bother
+ * deleting range.
+ */
+ if (range->start >= start &&
+ range->start + range->len <= start + len) {
+ cur_rsv = range->len;
+
+ range_freed = 1;
+ next = rb_next(&range->node);
+ rb_erase(&range->node, &map->root);
+ kfree(range);
+ goto next;
+
+ }
+
+ /*
+ * |<--desired-->|
+ * |///////|
+ */
+ if (range->start < start + len &&
+ range->start + range->len > start + len) {
+ cur_rsv = start + len - range->start;
+
+ range->len = range->start + range->len - start - len;
+ range->start = start + len;
+ goto next;
+ }
+next:
+ __update_rsv(map, reserved, cur_rsv);
+ if (!range_freed)
+ next = rb_next(&range->node);
+ if (!next)
+ break;
+ range = rb_entry(next, struct data_rsv_range, node);
+ }
+ return 0;
+}
+
/*
* Init data_rsv_map for a given inode.
*
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 08/23] btrfs: qgroup: Introduce function to release/free reserved data range
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (6 preceding siblings ...)
2015-10-09 2:18 ` [PATCH v2 07/23] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
@ 2015-10-09 2:18 ` Qu Wenruo
2015-10-09 2:18 ` [PATCH v2 09/23] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
` (15 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:18 UTC (permalink / raw)
To: linux-btrfs
Introduce functions btrfs_qgroup_release/free_data() to release/free
reserved data range.
Release means, just remove the data range from data rsv map, but doesn't
free the reserved space.
This is for normal buffered write case, when data is written into disc
and its metadata is added into tree, its reserved space should still be
kept until commit_trans().
So in that case, we only release dirty range, but keep the reserved
space recorded some other place until commit_tran().
Free means not only remove data range, but also free reserved space.
This is used for case for cleanup.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Fix comment typo
Update comment, to make it clear that the reserved space for any page
cache will either be released(it goes to disk) or freed directly
(truncated before reaching disk)
---
fs/btrfs/qgroup.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/btrfs/qgroup.h | 2 ++
2 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9934929..dbc0d06 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -85,7 +85,7 @@ struct btrfs_qgroup {
/*
* temp variables for accounting operations
- * Refer to qgroup_shared_accouting() for details.
+ * Refer to qgroup_shared_accounting() for details.
*/
u64 old_refcnt;
u64 new_refcnt;
@@ -2985,6 +2985,59 @@ next:
return 0;
}
+static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
+ int free_reserved)
+{
+ struct data_rsv_range *tmp;
+ struct btrfs_qgroup_data_rsv_map *map;
+ u64 reserved = 0;
+ int ret;
+
+ spin_lock(&BTRFS_I(inode)->qgroup_init_lock);
+ map = BTRFS_I(inode)->qgroup_rsv_map;
+ spin_unlock(&BTRFS_I(inode)->qgroup_init_lock);
+ if (!map)
+ return 0;
+
+ tmp = kmalloc(sizeof(*tmp), GFP_NOFS);
+ if (!tmp)
+ return -ENOMEM;
+ spin_lock(&map->lock);
+ ret = release_data_range(map, tmp, start, len, &reserved);
+ /* release_data_range() won't fail only check if memory is used */
+ if (ret == 0)
+ kfree(tmp);
+ if (free_reserved)
+ btrfs_qgroup_free(BTRFS_I(inode)->root, reserved);
+ spin_unlock(&map->lock);
+ return 0;
+}
+
+/*
+ * Free a reserved space range from its qgroup.
+ *
+ * Should be called when a delalloc page cache is going to be invalidated
+ * For a page cache, it will will be released (as it's written to disk) or
+ * freed directly (doesn't reach disk).
+ */
+int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len)
+{
+ return __btrfs_qgroup_release_data(inode, start, len, 1);
+}
+
+/*
+ * Release a reserved space range, but doesn't free it's qgroup reserved space
+ * The reserved space will still takes space until delayed refs is run.
+ *
+ * As qgroup accouting happens at commit time, for data written to disk
+ * its reserved space should not be freed until commit.
+ * Or we may exceed the limit.
+ */
+int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len)
+{
+ return __btrfs_qgroup_release_data(inode, start, len, 0);
+}
+
/*
* Init data_rsv_map for a given inode.
*
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 366b853..8e69dc1 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -88,4 +88,6 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
int btrfs_qgroup_init_data_rsv_map(struct inode *inode);
void btrfs_qgroup_free_data_rsv_map(struct inode *inode);
int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
#endif /* __BTRFS_QGROUP__ */
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 09/23] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (7 preceding siblings ...)
2015-10-09 2:18 ` [PATCH v2 08/23] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
@ 2015-10-09 2:18 ` Qu Wenruo
2015-10-09 2:22 ` [PATCH v2 10/23] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
` (14 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:18 UTC (permalink / raw)
To: linux-btrfs
Add new function btrfs_add_delayed_qgroup_reserve() function to record
how much space is reserved for that extent.
As btrfs only accounts qgroup at run_delayed_refs() time, so newly
allocated extent should keep the reserved space until then.
So add needed function with related members to do it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
fs/btrfs/delayed-ref.h | 14 ++++++++++++++
2 files changed, 43 insertions(+)
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ac3e81d..bd9b63b 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
INIT_LIST_HEAD(&head_ref->ref_list);
head_ref->processing = 0;
head_ref->total_ref_mod = count_mod;
+ head_ref->qgroup_reserved = 0;
+ head_ref->qgroup_ref_root = 0;
/* Record qgroup extent info if provided */
if (qrecord) {
@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
return 0;
}
+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+ struct btrfs_trans_handle *trans,
+ u64 ref_root, u64 bytenr, u64 num_bytes)
+{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *ref_head;
+ int ret = 0;
+
+ if (!fs_info->quota_enabled || !is_fstree(ref_root))
+ return 0;
+
+ delayed_refs = &trans->transaction->delayed_refs;
+
+ spin_lock(&delayed_refs->lock);
+ ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
+ if (!ref_head) {
+ ret = -ENOENT;
+ goto out;
+ }
+ WARN_ON(ref_head->qgroup_reserved || ref_head->qgroup_ref_root);
+ ref_head->qgroup_ref_root = ref_root;
+ ref_head->qgroup_reserved = num_bytes;
+out:
+ spin_unlock(&delayed_refs->lock);
+ return ret;
+}
+
int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes,
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 13fb5e6..d4c41e2 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -113,6 +113,17 @@ struct btrfs_delayed_ref_head {
int total_ref_mod;
/*
+ * For qgroup reserved space freeing.
+ *
+ * ref_root and reserved will be recorded after
+ * BTRFS_ADD_DELAYED_EXTENT is called.
+ * And will be used to free reserved qgroup space at
+ * run_delayed_refs() time.
+ */
+ u64 qgroup_ref_root;
+ u64 qgroup_reserved;
+
+ /*
* when a new extent is allocated, it is just reserved in memory
* The actual extent isn't inserted into the extent allocation tree
* until the delayed ref is processed. must_insert_reserved is
@@ -242,6 +253,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
u64 owner, u64 offset, int action,
struct btrfs_delayed_extent_op *extent_op,
int no_quota);
+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+ struct btrfs_trans_handle *trans,
+ u64 ref_root, u64 bytenr, u64 num_bytes);
int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes,
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 10/23] btrfs: delayed_ref: release and free qgroup reserved at proper timing
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (8 preceding siblings ...)
2015-10-09 2:18 ` [PATCH v2 09/23] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
@ 2015-10-09 2:22 ` Qu Wenruo
2015-10-09 2:22 ` [PATCH v2 11/23] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
` (13 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:22 UTC (permalink / raw)
To: linux-btrfs
Qgroup reserved space needs to be released from inode dirty map and get
freed at different timing:
1) Release when the metadata is written into tree
After corresponding metadata is written into tree, any newer write will
be COWed(don't include NOCOW case yet).
So we must release its range from inode dirty range map, or we will
forget to reserve needed range, causing accounting exceeding the limit.
2) Free reserved bytes when delayed ref is run
When delayed refs are run, qgroup accounting will follow soon and turn
the reserved bytes into rfer/excl numbers.
As run_delayed_refs and qgroup accounting are all done at
commit_transaction() time, we are safe to free reserved space in
run_delayed_ref time().
With these timing to release/free reserved space, we should be able to
resolve the long existing qgroup reserve space leak problem.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Use a better wrapped function for delayed_ref reserved space release.
As direct call to btrfs_qgroup_free_ref() will make it hard to add
trace event.
---
fs/btrfs/extent-tree.c | 5 +++++
fs/btrfs/inode.c | 10 ++++++++++
fs/btrfs/qgroup.c | 5 ++---
fs/btrfs/qgroup.h | 18 +++++++++++++++++-
4 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 601d7d4..4f6758b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2345,6 +2345,11 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
node->num_bytes);
}
}
+
+ /* Also free its reserved qgroup space */
+ btrfs_qgroup_free_delayed_ref(root->fs_info,
+ head->qgroup_ref_root,
+ head->qgroup_reserved);
return ret;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 79ad301..8ca2993 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2112,6 +2112,16 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
ret = btrfs_alloc_reserved_file_extent(trans, root,
root->root_key.objectid,
btrfs_ino(inode), file_pos, &ins);
+ if (ret < 0)
+ goto out;
+ /*
+ * Release the reserved range from inode dirty range map, and
+ * move it to delayed ref codes, as now accounting only happens at
+ * commit_transaction() time.
+ */
+ btrfs_qgroup_release_data(inode, file_pos, ram_bytes);
+ ret = btrfs_add_delayed_qgroup_reserve(root->fs_info, trans,
+ root->objectid, disk_bytenr, ram_bytes);
out:
btrfs_free_path(path);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index dbc0d06..1f03f9d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2169,14 +2169,13 @@ out:
return ret;
}
-void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
+void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
+ u64 ref_root, u64 num_bytes)
{
struct btrfs_root *quota_root;
struct btrfs_qgroup *qgroup;
- struct btrfs_fs_info *fs_info = root->fs_info;
struct ulist_node *unode;
struct ulist_iterator uiter;
- u64 ref_root = root->root_key.objectid;
int ret = 0;
if (!is_fstree(ref_root))
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 8e69dc1..c7ee46a 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -75,7 +75,23 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
struct btrfs_qgroup_inherit *inherit);
int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
-void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
+void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
+ u64 ref_root, u64 num_bytes);
+static inline void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
+{
+ return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
+ num_bytes);
+}
+
+/*
+ * TODO: Add proper trace point for it, as btrfs_qgroup_free() is
+ * called by everywhere, can't provide good trace for delayed ref case.
+ */
+static inline void btrfs_qgroup_free_delayed_ref(struct btrfs_fs_info *fs_info,
+ u64 ref_root, u64 num_bytes)
+{
+ btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes);
+}
void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 11/23] btrfs: qgroup: Introduce new functions to reserve/free metadata
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (9 preceding siblings ...)
2015-10-09 2:22 ` [PATCH v2 10/23] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
@ 2015-10-09 2:22 ` Qu Wenruo
2015-10-09 2:22 ` [PATCH v2 12/23] btrfs: qgroup: Use new metadata reservation Qu Wenruo
` (12 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:22 UTC (permalink / raw)
To: linux-btrfs
Introduce new functions btrfs_qgroup_reserve/free_meta() to reserve/free
metadata reserved space.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
fs/btrfs/ctree.h | 3 +++
fs/btrfs/disk-io.c | 1 +
fs/btrfs/qgroup.c | 40 ++++++++++++++++++++++++++++++++++++++++
fs/btrfs/qgroup.h | 4 ++++
4 files changed, 48 insertions(+)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..ae86025 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1943,6 +1943,9 @@ struct btrfs_root {
int send_in_progress;
struct btrfs_subvolume_writers *subv_writers;
atomic_t will_be_snapshoted;
+
+ /* For qgroup metadata space reserve */
+ atomic_t qgroup_meta_rsv;
};
struct btrfs_ioctl_defrag_range_args {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 807f685..2b51705 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1259,6 +1259,7 @@ static void __setup_root(u32 nodesize, u32 sectorsize, u32 stripesize,
atomic_set(&root->orphan_inodes, 0);
atomic_set(&root->refs, 1);
atomic_set(&root->will_be_snapshoted, 0);
+ atomic_set(&root->qgroup_meta_rsv, 0);
root->log_transid = 0;
root->log_transid_committed = -1;
root->last_log_commit = 0;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 1f03f9d..b7f6ce1 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3120,3 +3120,43 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
kfree(dirty_map);
binode->qgroup_rsv_map = NULL;
}
+
+int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes)
+{
+ int ret;
+
+ if (!root->fs_info->quota_enabled || !is_fstree(root->objectid) ||
+ num_bytes == 0)
+ return 0;
+
+ BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
+ ret = btrfs_qgroup_reserve(root, num_bytes);
+ if (ret < 0)
+ return ret;
+ atomic_add(num_bytes, &root->qgroup_meta_rsv);
+ return ret;
+}
+
+void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
+{
+ int reserved;
+
+ if (!root->fs_info->quota_enabled || !is_fstree(root->objectid))
+ return;
+
+ reserved = atomic_xchg(&root->qgroup_meta_rsv, 0);
+ if (reserved == 0)
+ return;
+ btrfs_qgroup_free(root, reserved);
+}
+
+void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
+{
+ if (!root->fs_info->quota_enabled || !is_fstree(root->objectid))
+ return;
+
+ BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
+ WARN_ON(atomic_read(&root->qgroup_meta_rsv) < num_bytes);
+ atomic_sub(num_bytes, &root->qgroup_meta_rsv);
+ btrfs_qgroup_free(root, num_bytes);
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index c7ee46a..47d75cb 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -106,4 +106,8 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode);
int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
+
+int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes);
+void btrfs_qgroup_free_meta_all(struct btrfs_root *root);
+void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes);
#endif /* __BTRFS_QGROUP__ */
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 12/23] btrfs: qgroup: Use new metadata reservation.
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (10 preceding siblings ...)
2015-10-09 2:22 ` [PATCH v2 11/23] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
@ 2015-10-09 2:22 ` Qu Wenruo
2015-10-09 2:22 ` [PATCH v2 13/23] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space Qu Wenruo
` (11 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:22 UTC (permalink / raw)
To: linux-btrfs
As we have the new metadata reservation functions, use them to replace
the old btrfs_qgroup_reserve() call for metadata.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
fs/btrfs/extent-tree.c | 14 ++++++--------
fs/btrfs/transaction.c | 34 ++++++----------------------------
fs/btrfs/transaction.h | 1 -
3 files changed, 12 insertions(+), 37 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4f6758b..22702bd 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5345,7 +5345,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
if (root->fs_info->quota_enabled) {
/* One for parent inode, two for dir entries */
num_bytes = 3 * root->nodesize;
- ret = btrfs_qgroup_reserve(root, num_bytes);
+ ret = btrfs_qgroup_reserve_meta(root, num_bytes);
if (ret)
return ret;
} else {
@@ -5363,10 +5363,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
if (ret == -ENOSPC && use_global_rsv)
ret = btrfs_block_rsv_migrate(global_rsv, rsv, num_bytes);
- if (ret) {
- if (*qgroup_reserved)
- btrfs_qgroup_free(root, *qgroup_reserved);
- }
+ if (ret && *qgroup_reserved)
+ btrfs_qgroup_free_meta(root, *qgroup_reserved);
return ret;
}
@@ -5527,15 +5525,15 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
spin_unlock(&BTRFS_I(inode)->lock);
if (root->fs_info->quota_enabled) {
- ret = btrfs_qgroup_reserve(root, nr_extents * root->nodesize);
+ ret = btrfs_qgroup_reserve_meta(root,
+ nr_extents * root->nodesize);
if (ret)
goto out_fail;
}
ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush);
if (unlikely(ret)) {
- if (root->fs_info->quota_enabled)
- btrfs_qgroup_free(root, nr_extents * root->nodesize);
+ btrfs_qgroup_free_meta(root, nr_extents * root->nodesize);
goto out_fail;
}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 376191c..5ed06b8 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -478,13 +478,10 @@ start_transaction(struct btrfs_root *root, u64 num_items, unsigned int type,
* the appropriate flushing if need be.
*/
if (num_items > 0 && root != root->fs_info->chunk_root) {
- if (root->fs_info->quota_enabled &&
- is_fstree(root->root_key.objectid)) {
- qgroup_reserved = num_items * root->nodesize;
- ret = btrfs_qgroup_reserve(root, qgroup_reserved);
- if (ret)
- return ERR_PTR(ret);
- }
+ qgroup_reserved = num_items * root->nodesize;
+ ret = btrfs_qgroup_reserve_meta(root, qgroup_reserved);
+ if (ret)
+ return ERR_PTR(ret);
num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
/*
@@ -553,7 +550,6 @@ again:
h->block_rsv = NULL;
h->orig_rsv = NULL;
h->aborted = 0;
- h->qgroup_reserved = 0;
h->delayed_ref_elem.seq = 0;
h->type = type;
h->allocating_chunk = false;
@@ -579,7 +575,6 @@ again:
h->bytes_reserved = num_bytes;
h->reloc_reserved = reloc_reserved;
}
- h->qgroup_reserved = qgroup_reserved;
got_it:
btrfs_record_root_in_trans(h, root);
@@ -597,8 +592,7 @@ alloc_fail:
btrfs_block_rsv_release(root, &root->fs_info->trans_block_rsv,
num_bytes);
reserve_fail:
- if (qgroup_reserved)
- btrfs_qgroup_free(root, qgroup_reserved);
+ btrfs_qgroup_free_meta(root, qgroup_reserved);
return ERR_PTR(ret);
}
@@ -815,15 +809,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
must_run_delayed_refs = 2;
}
- if (trans->qgroup_reserved) {
- /*
- * the same root has to be passed here between start_transaction
- * and end_transaction. Subvolume quota depends on this.
- */
- btrfs_qgroup_free(trans->root, trans->qgroup_reserved);
- trans->qgroup_reserved = 0;
- }
-
btrfs_trans_release_metadata(trans, root);
trans->block_rsv = NULL;
@@ -1238,6 +1223,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
spin_lock(&fs_info->fs_roots_radix_lock);
if (err)
break;
+ btrfs_qgroup_free_meta_all(root);
}
}
spin_unlock(&fs_info->fs_roots_radix_lock);
@@ -1846,10 +1832,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
btrfs_trans_release_metadata(trans, root);
trans->block_rsv = NULL;
- if (trans->qgroup_reserved) {
- btrfs_qgroup_free(root, trans->qgroup_reserved);
- trans->qgroup_reserved = 0;
- }
cur_trans = trans->transaction;
@@ -2202,10 +2184,6 @@ cleanup_transaction:
btrfs_trans_release_metadata(trans, root);
btrfs_trans_release_chunk_metadata(trans);
trans->block_rsv = NULL;
- if (trans->qgroup_reserved) {
- btrfs_qgroup_free(root, trans->qgroup_reserved);
- trans->qgroup_reserved = 0;
- }
btrfs_warn(root->fs_info, "Skipping commit of aborted transaction.");
if (current->journal_info == trans)
current->journal_info = NULL;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index a994bb0..ce41bc9 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -107,7 +107,6 @@ struct btrfs_trans_handle {
u64 transid;
u64 bytes_reserved;
u64 chunk_bytes_reserved;
- u64 qgroup_reserved;
unsigned long use_count;
unsigned long blocks_reserved;
unsigned long blocks_used;
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 13/23] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space.
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (11 preceding siblings ...)
2015-10-09 2:22 ` [PATCH v2 12/23] btrfs: qgroup: Use new metadata reservation Qu Wenruo
@ 2015-10-09 2:22 ` Qu Wenruo
2015-10-09 2:25 ` [PATCH v2 14/23] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space Qu Wenruo
` (10 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:22 UTC (permalink / raw)
To: linux-btrfs
Add new functions __btrfs_check_data_free_space() and
__btrfs_free_reserved_data_space() to work with new accurate qgroup
reserved space framework.
The new function will replace old btrfs_check_data_free_space() and
btrfs_free_reserved_data_space() respectively, but until all the change
is done, let's just use the new name.
Also, export internal use function btrfs_alloc_data_chunk_ondemand(), as
now qgroup reserve requires precious bytes, some operation can't get the
accurate number in advance(like fallocate).
But data space info check and data chunk allocate doesn't need to be
that accurate, and can be called at the beginning.
So export it for later operations.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Fix comment typo
Add __btrfs_free_reserved_data_space() function, or we will leak
reserved space at EQUOT error handle routine.
---
fs/btrfs/ctree.h | 3 ++
fs/btrfs/extent-tree.c | 85 ++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 79 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ae86025..19450a1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3453,7 +3453,10 @@ enum btrfs_reserve_flush_enum {
};
int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes);
+int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
+int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
+void __btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 22702bd..0cd6baa 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3908,11 +3908,7 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data)
return ret;
}
-/*
- * This will check the space that the inode allocates from to make sure we have
- * enough space for bytes.
- */
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
+int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
{
struct btrfs_space_info *data_sinfo;
struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -4033,19 +4029,55 @@ commit_trans:
data_sinfo->flags, bytes, 1);
return -ENOSPC;
}
- ret = btrfs_qgroup_reserve(root, write_bytes);
- if (ret)
- goto out;
data_sinfo->bytes_may_use += bytes;
trace_btrfs_space_reservation(root->fs_info, "space_info",
data_sinfo->flags, bytes, 1);
-out:
spin_unlock(&data_sinfo->lock);
return ret;
}
/*
+ * This will check the space that the inode allocates from to make sure we have
+ * enough space for bytes.
+ */
+int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
+{
+ struct btrfs_root *root = BTRFS_I(inode)->root;
+ int ret;
+
+ ret = btrfs_alloc_data_chunk_ondemand(inode, bytes);
+ if (ret < 0)
+ return ret;
+ ret = btrfs_qgroup_reserve(root, write_bytes);
+ return ret;
+}
+
+/*
+ * New check_data_free_space() with ability for precious data reservation
+ * Will replace old btrfs_check_data_free_space(), but for patch split,
+ * add a new function first and then replace it.
+ */
+int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+{
+ struct btrfs_root *root = BTRFS_I(inode)->root;
+ int ret;
+
+ /* align the range */
+ len = round_up(start + len, root->sectorsize) -
+ round_down(start, root->sectorsize);
+ start = round_down(start, root->sectorsize);
+
+ ret = btrfs_alloc_data_chunk_ondemand(inode, len);
+ if (ret < 0)
+ return ret;
+
+ /* Use new btrfs_qgroup_reserve_data to reserve precious data space */
+ ret = btrfs_qgroup_reserve_data(inode, start, len);
+ return ret;
+}
+
+/*
* Called if we need to clear a data reservation for this inode.
*/
void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
@@ -4065,6 +4097,41 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
spin_unlock(&data_sinfo->lock);
}
+/*
+ * Called if we need to clear a data reservation for this inode
+ * Normally in a error case.
+ *
+ * This one will handle the per-indoe data rsv map for accurate reserved
+ * space framework.
+ */
+void __btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+{
+ struct btrfs_root *root = BTRFS_I(inode)->root;
+ struct btrfs_space_info *data_sinfo;
+
+ /* Make sure the range is aligned to sectorsize */
+ len = round_up(start + len, root->sectorsize) -
+ round_down(start, root->sectorsize);
+ start = round_down(start, root->sectorsize);
+
+ /*
+ * Free any reserved qgroup data space first
+ * As it will alloc memory, we can't do it with data sinfo
+ * spinlock hold.
+ */
+ btrfs_qgroup_free_data(inode, start, len);
+
+ data_sinfo = root->fs_info->data_sinfo;
+ spin_lock(&data_sinfo->lock);
+ if (WARN_ON(data_sinfo->bytes_may_use < len))
+ data_sinfo->bytes_may_use = 0;
+ else
+ data_sinfo->bytes_may_use -= len;
+ trace_btrfs_space_reservation(root->fs_info, "space_info",
+ data_sinfo->flags, len, 0);
+ spin_unlock(&data_sinfo->lock);
+}
+
static void force_metadata_allocation(struct btrfs_fs_info *info)
{
struct list_head *head = &info->space_info;
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 14/23] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (12 preceding siblings ...)
2015-10-09 2:22 ` [PATCH v2 13/23] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space Qu Wenruo
@ 2015-10-09 2:25 ` Qu Wenruo
2015-10-09 2:25 ` [PATCH v2 15/23] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space Qu Wenruo
` (9 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:25 UTC (permalink / raw)
To: linux-btrfs
Use new reserve/free for buffered write and inode cache.
For buffered write case, as nodatacow write won't increase quota account,
so unlike old behavior which does reserve before check nocow, now we
check nocow first and then only reserve data if we can't do nocow write.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Add call for new free function too. Or we will leak reserved space in
case of data reservation succeeded but metadata reservation failed.
---
fs/btrfs/extent-tree.c | 4 ++--
fs/btrfs/file.c | 34 +++++++++++++++++++++-------------
fs/btrfs/relocation.c | 8 ++++----
3 files changed, 27 insertions(+), 19 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0cd6baa..f4b9db8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3356,7 +3356,7 @@ again:
num_pages *= 16;
num_pages *= PAGE_CACHE_SIZE;
- ret = btrfs_check_data_free_space(inode, num_pages, num_pages);
+ ret = __btrfs_check_data_free_space(inode, 0, num_pages);
if (ret)
goto out_put;
@@ -3365,7 +3365,7 @@ again:
&alloc_hint);
if (!ret)
dcs = BTRFS_DC_SETUP;
- btrfs_free_reserved_data_space(inode, num_pages);
+ __btrfs_free_reserved_data_space(inode, 0, num_pages);
out_put:
iput(inode);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..142b217 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1510,12 +1510,17 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
}
reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
- ret = btrfs_check_data_free_space(inode, reserve_bytes, write_bytes);
- if (ret == -ENOSPC &&
- (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
- BTRFS_INODE_PREALLOC))) {
+
+ if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+ BTRFS_INODE_PREALLOC)) {
ret = check_can_nocow(inode, pos, &write_bytes);
+ if (ret < 0)
+ break;
if (ret > 0) {
+ /*
+ * For nodata cow case, no need to reserve
+ * data space.
+ */
only_release_metadata = true;
/*
* our prealloc extent may be smaller than
@@ -1524,20 +1529,19 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
num_pages = DIV_ROUND_UP(write_bytes + offset,
PAGE_CACHE_SIZE);
reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
- ret = 0;
- } else {
- ret = -ENOSPC;
+ goto reserve_metadata;
}
}
-
- if (ret)
+ ret = __btrfs_check_data_free_space(inode, pos, write_bytes);
+ if (ret < 0)
break;
+reserve_metadata:
ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes);
if (ret) {
if (!only_release_metadata)
- btrfs_free_reserved_data_space(inode,
- reserve_bytes);
+ __btrfs_free_reserved_data_space(inode, pos,
+ write_bytes);
else
btrfs_end_write_no_snapshoting(root);
break;
@@ -2569,8 +2573,11 @@ static long btrfs_fallocate(struct file *file, int mode,
/*
* Make sure we have enough space before we do the
* allocation.
+ * XXX: The behavior must be changed to do accurate check first
+ * and then check data reserved space.
*/
- ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start, alloc_end - alloc_start);
+ ret = btrfs_check_data_free_space(inode, alloc_start,
+ alloc_end - alloc_start);
if (ret)
return ret;
@@ -2703,7 +2710,8 @@ static long btrfs_fallocate(struct file *file, int mode,
out:
mutex_unlock(&inode->i_mutex);
/* Let go of our reservation. */
- btrfs_free_reserved_data_space(inode, alloc_end - alloc_start);
+ __btrfs_free_reserved_data_space(inode, alloc_start,
+ alloc_end - alloc_start);
return ret;
}
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 303babe..f4621c5 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3034,8 +3034,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
BUG_ON(cluster->start != cluster->boundary[0]);
mutex_lock(&inode->i_mutex);
- ret = btrfs_check_data_free_space(inode, cluster->end +
- 1 - cluster->start, 0);
+ ret = __btrfs_check_data_free_space(inode, cluster->start,
+ cluster->end + 1 - cluster->start);
if (ret)
goto out;
@@ -3056,8 +3056,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
break;
nr++;
}
- btrfs_free_reserved_data_space(inode, cluster->end +
- 1 - cluster->start);
+ __btrfs_free_reserved_data_space(inode, cluster->start,
+ cluster->end + 1 - cluster->start);
out:
mutex_unlock(&inode->i_mutex);
return ret;
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 15/23] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (13 preceding siblings ...)
2015-10-09 2:25 ` [PATCH v2 14/23] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space Qu Wenruo
@ 2015-10-09 2:25 ` Qu Wenruo
2015-10-09 2:25 ` [PATCH v2 16/23] btrfs: extent-tree: Switch to new delalloc space reserve and release Qu Wenruo
` (8 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:25 UTC (permalink / raw)
To: linux-btrfs
Add new version of btrfs_delalloc_reserve_space() and
btrfs_delalloc_release_space() functions, which supports accurate qgroup
reserve.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Add new function btrfs_delalloc_release_space() to handle error case.
---
fs/btrfs/ctree.h | 2 ++
fs/btrfs/extent-tree.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 19450a1..4221bfd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3473,7 +3473,9 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root,
int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes);
+int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes);
+void __btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
unsigned short type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f4b9db8..32455e0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5723,6 +5723,44 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
}
/**
+ * __btrfs_delalloc_reserve_space - reserve data and metadata space for
+ * delalloc
+ * @inode: inode we're writing to
+ * @start: start range we are writing to
+ * @len: how long the range we are writing to
+ *
+ * TODO: This function will finally replace old btrfs_delalloc_reserve_space()
+ *
+ * This will do the following things
+ *
+ * o reserve space in data space info for num bytes
+ * and reserve precious corresponding qgroup space
+ * (Done in check_data_free_space)
+ *
+ * o reserve space for metadata space, based on the number of outstanding
+ * extents and how much csums will be needed
+ * also reserve metadata space in a per root over-reserve method.
+ * o add to the inodes->delalloc_bytes
+ * o add it to the fs_info's delalloc inodes list.
+ * (Above 3 all done in delalloc_reserve_metadata)
+ *
+ * Return 0 for success
+ * Return <0 for error(-ENOSPC or -EQUOT)
+ */
+int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+{
+ int ret;
+
+ ret = __btrfs_check_data_free_space(inode, start, len);
+ if (ret < 0)
+ return ret;
+ ret = btrfs_delalloc_reserve_metadata(inode, len);
+ if (ret < 0)
+ __btrfs_free_reserved_data_space(inode, start, len);
+ return ret;
+}
+
+/**
* btrfs_delalloc_reserve_space - reserve data and metadata space for delalloc
* @inode: inode we're writing to
* @num_bytes: the number of bytes we want to allocate
@@ -5755,6 +5793,27 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes)
}
/**
+ * __btrfs_delalloc_release_space - release data and metadata space for delalloc
+ * @inode: inode we're releasing space for
+ * @start: start position of the space already reserved
+ * @len: the len of the space already reserved
+ *
+ * This must be matched with a call to btrfs_delalloc_reserve_space. This is
+ * called in the case that we don't need the metadata AND data reservations
+ * anymore. So if there is an error or we insert an inline extent.
+ *
+ * This function will release the metadata space that was not used and will
+ * decrement ->delalloc_bytes and remove it from the fs_info delalloc_inodes
+ * list if there are no delalloc bytes left.
+ * Also it will handle the qgroup reserved space.
+ */
+void __btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
+{
+ btrfs_delalloc_release_metadata(inode, len);
+ __btrfs_free_reserved_data_space(inode, start, len);
+}
+
+/**
* btrfs_delalloc_release_space - release data and metadata space for delalloc
* @inode: inode we're releasing space for
* @num_bytes: the number of bytes we want to free up
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 16/23] btrfs: extent-tree: Switch to new delalloc space reserve and release
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (14 preceding siblings ...)
2015-10-09 2:25 ` [PATCH v2 15/23] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space Qu Wenruo
@ 2015-10-09 2:25 ` Qu Wenruo
2015-10-09 2:30 ` [PATCH v2 18/23] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
` (7 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:25 UTC (permalink / raw)
To: linux-btrfs
Use new __btrfs_delalloc_reserve_space() and
__btrfs_delalloc_release_space() to reserve and release space for
delalloc.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Also use __btrfs_delalloc_release_space() function.
---
fs/btrfs/file.c | 5 +++--
fs/btrfs/inode-map.c | 6 +++---
fs/btrfs/inode.c | 38 +++++++++++++++++++++++---------------
fs/btrfs/ioctl.c | 14 +++++++++-----
4 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 142b217..bf4d5fb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1611,7 +1611,7 @@ again:
btrfs_delalloc_release_metadata(inode,
release_bytes);
else
- btrfs_delalloc_release_space(inode,
+ __btrfs_delalloc_release_space(inode, pos,
release_bytes);
}
@@ -1664,7 +1664,8 @@ again:
btrfs_end_write_no_snapshoting(root);
btrfs_delalloc_release_metadata(inode, release_bytes);
} else {
- btrfs_delalloc_release_space(inode, release_bytes);
+ __btrfs_delalloc_release_space(inode, pos,
+ release_bytes);
}
}
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index d4a582a..78bc09c 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -488,17 +488,17 @@ again:
/* Just to make sure we have enough space */
prealloc += 8 * PAGE_CACHE_SIZE;
- ret = btrfs_delalloc_reserve_space(inode, prealloc);
+ ret = __btrfs_delalloc_reserve_space(inode, 0, prealloc);
if (ret)
goto out_put;
ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
prealloc, prealloc, &alloc_hint);
if (ret) {
- btrfs_delalloc_release_space(inode, prealloc);
+ __btrfs_delalloc_release_space(inode, 0, prealloc);
goto out_put;
}
- btrfs_free_reserved_data_space(inode, prealloc);
+ __btrfs_free_reserved_data_space(inode, 0, prealloc);
ret = btrfs_write_out_ino_cache(root, trans, path, inode);
out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8ca2993..38a0fb9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1766,7 +1766,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
&& do_list && !(state->state & EXTENT_NORESERVE))
- btrfs_free_reserved_data_space(inode, len);
+ __btrfs_free_reserved_data_space(inode, state->start,
+ len);
__percpu_counter_add(&root->fs_info->delalloc_bytes, -len,
root->fs_info->delalloc_batch);
@@ -1985,7 +1986,8 @@ again:
goto again;
}
- ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+ ret = __btrfs_delalloc_reserve_space(inode, page_start,
+ PAGE_CACHE_SIZE);
if (ret) {
mapping_set_error(page->mapping, ret);
end_extent_writepage(page, ret, page_start, page_end);
@@ -4581,14 +4583,17 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
if ((offset & (blocksize - 1)) == 0 &&
(!len || ((len & (blocksize - 1)) == 0)))
goto out;
- ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+ ret = __btrfs_delalloc_reserve_space(inode,
+ round_down(from, PAGE_CACHE_SIZE), PAGE_CACHE_SIZE);
if (ret)
goto out;
again:
page = find_or_create_page(mapping, index, mask);
if (!page) {
- btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+ __btrfs_delalloc_release_space(inode,
+ round_down(from, PAGE_CACHE_SIZE),
+ PAGE_CACHE_SIZE);
ret = -ENOMEM;
goto out;
}
@@ -4656,7 +4661,8 @@ again:
out_unlock:
if (ret)
- btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+ __btrfs_delalloc_release_space(inode, page_start,
+ PAGE_CACHE_SIZE);
unlock_page(page);
page_cache_release(page);
out:
@@ -7587,7 +7593,7 @@ unlock:
spin_unlock(&BTRFS_I(inode)->lock);
}
- btrfs_free_reserved_data_space(inode, len);
+ __btrfs_free_reserved_data_space(inode, start, len);
WARN_ON(dio_data->reserve < len);
dio_data->reserve -= len;
current->journal_info = dio_data;
@@ -8380,7 +8386,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
mutex_unlock(&inode->i_mutex);
relock = true;
}
- ret = btrfs_delalloc_reserve_space(inode, count);
+ ret = __btrfs_delalloc_reserve_space(inode, offset, count);
if (ret)
goto out;
dio_data.outstanding_extents = div64_u64(count +
@@ -8409,11 +8415,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
current->journal_info = NULL;
if (ret < 0 && ret != -EIOCBQUEUED) {
if (dio_data.reserve)
- btrfs_delalloc_release_space(inode,
- dio_data.reserve);
+ __btrfs_delalloc_release_space(inode, offset,
+ dio_data.reserve);
} else if (ret >= 0 && (size_t)ret < count)
- btrfs_delalloc_release_space(inode,
- count - (size_t)ret);
+ __btrfs_delalloc_release_space(inode, offset,
+ count - (size_t)ret);
}
out:
if (wakeup)
@@ -8621,7 +8627,11 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
u64 page_end;
sb_start_pagefault(inode->i_sb);
- ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+ page_start = page_offset(page);
+ page_end = page_start + PAGE_CACHE_SIZE - 1;
+
+ ret = __btrfs_delalloc_reserve_space(inode, page_start,
+ PAGE_CACHE_SIZE);
if (!ret) {
ret = file_update_time(vma->vm_file);
reserved = 1;
@@ -8640,8 +8650,6 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
again:
lock_page(page);
size = i_size_read(inode);
- page_start = page_offset(page);
- page_end = page_start + PAGE_CACHE_SIZE - 1;
if ((page->mapping != inode->i_mapping) ||
(page_start >= size)) {
@@ -8718,7 +8726,7 @@ out_unlock:
}
unlock_page(page);
out:
- btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+ __btrfs_delalloc_release_space(inode, page_start, PAGE_CACHE_SIZE);
out_noreserve:
sb_end_pagefault(inode->i_sb);
return ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0adf542..3158b0f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1119,8 +1119,9 @@ static int cluster_pages_for_defrag(struct inode *inode,
page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
- ret = btrfs_delalloc_reserve_space(inode,
- page_cnt << PAGE_CACHE_SHIFT);
+ ret = __btrfs_delalloc_reserve_space(inode,
+ start_index << PAGE_CACHE_SHIFT,
+ page_cnt << PAGE_CACHE_SHIFT);
if (ret)
return ret;
i_done = 0;
@@ -1209,8 +1210,9 @@ again:
spin_lock(&BTRFS_I(inode)->lock);
BTRFS_I(inode)->outstanding_extents++;
spin_unlock(&BTRFS_I(inode)->lock);
- btrfs_delalloc_release_space(inode,
- (page_cnt - i_done) << PAGE_CACHE_SHIFT);
+ __btrfs_delalloc_release_space(inode,
+ start_index << PAGE_CACHE_SHIFT,
+ (page_cnt - i_done) << PAGE_CACHE_SHIFT);
}
@@ -1235,7 +1237,9 @@ out:
unlock_page(pages[i]);
page_cache_release(pages[i]);
}
- btrfs_delalloc_release_space(inode, page_cnt << PAGE_CACHE_SHIFT);
+ __btrfs_delalloc_release_space(inode,
+ start_index << PAGE_CACHE_SHIFT,
+ page_cnt << PAGE_CACHE_SHIFT);
return ret;
}
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 18/23] btrfs: qgroup: Add handler for NOCOW and inline
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (15 preceding siblings ...)
2015-10-09 2:25 ` [PATCH v2 16/23] btrfs: extent-tree: Switch to new delalloc space reserve and release Qu Wenruo
@ 2015-10-09 2:30 ` Qu Wenruo
2015-10-09 2:30 ` [PATCH v2 19/23] btrfs: Add handler for invalidate page Qu Wenruo
` (6 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:30 UTC (permalink / raw)
To: linux-btrfs
For NOCOW and inline case, there will be no delayed_ref created for
them, so we should free their reserved data space at proper
time(finish_ordered_io for NOCOW and cow_file_inline for inline).
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
fs/btrfs/extent-tree.c | 7 ++++++-
fs/btrfs/inode.c | 15 +++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1dadbba..765f7e0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4056,7 +4056,12 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
if (ret < 0)
return ret;
- /* Use new btrfs_qgroup_reserve_data to reserve precious data space */
+ /*
+ * Use new btrfs_qgroup_reserve_data to reserve precious data space
+ *
+ * TODO: Find a good method to avoid reserve data space for NOCOW
+ * range, but don't impact performance on quota disable case.
+ */
ret = btrfs_qgroup_reserve_data(inode, start, len);
return ret;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 84c31dd..ee0b239 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -310,6 +310,13 @@ static noinline int cow_file_range_inline(struct btrfs_root *root,
btrfs_delalloc_release_metadata(inode, end + 1 - start);
btrfs_drop_extent_cache(inode, start, aligned_end - 1, 0);
out:
+ /*
+ * Don't forget to free the reserved space, as for inlined extent
+ * it won't count as data extent, free them directly here.
+ * And at reserve time, it's always aligned to page size, so
+ * just free one page here.
+ */
+ btrfs_qgroup_free_data(inode, 0, PAGE_CACHE_SIZE);
btrfs_free_path(path);
btrfs_end_transaction(trans, root);
return ret;
@@ -2832,6 +2839,14 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
if (test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags)) {
BUG_ON(!list_empty(&ordered_extent->list)); /* Logic error */
+
+ /*
+ * For mwrite(mmap + memset to write) case, we still reserve
+ * space for NOCOW range.
+ * As NOCOW won't cause a new delayed ref, just free the space
+ */
+ btrfs_qgroup_free_data(inode, ordered_extent->file_offset,
+ ordered_extent->len);
btrfs_ordered_update_i_size(inode, 0, ordered_extent);
if (nolock)
trans = btrfs_join_transaction_nolock(root);
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 19/23] btrfs: Add handler for invalidate page
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (16 preceding siblings ...)
2015-10-09 2:30 ` [PATCH v2 18/23] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
@ 2015-10-09 2:30 ` Qu Wenruo
2015-10-09 2:34 ` [PATCH v2 20/23] btrfs: qgroup: Add new trace point for qgroup data reserve Qu Wenruo
` (5 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:30 UTC (permalink / raw)
To: linux-btrfs
For btrfs_invalidatepage() and its variant evict_inode_truncate_page(),
there will be pages don't reach disk.
In that case, their reserved space won't be release nor freed by
finish_ordered_io() nor delayed_ref handler.
So we must free their qgroup reserved space, or we will leaking reserved
space again.
So this will patch will call btrfs_qgroup_free_data() for
invalidatepage() and its variant evict_inode_truncate_page().
And due to the nature of new btrfs_qgroup_reserve/free_data() reserved
space will only be reserved or freed once, so for pages which are
already flushed to disk, their reserved space will be released and freed
by delayed_ref handler.
Double free won't be a problem.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Newly introduced
---
fs/btrfs/inode.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ee0b239..85b06d1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5075,6 +5075,18 @@ static void evict_inode_truncate_pages(struct inode *inode)
spin_unlock(&io_tree->lock);
lock_extent_bits(io_tree, start, end, 0, &cached_state);
+
+ /*
+ * If still has DELALLOC flag, the extent didn't reach disk,
+ * and its reserved space won't be freed by delayed_ref.
+ * So we need to free its reserved space here.
+ * (Refer to comment in btrfs_invalidatepage, case 2)
+ *
+ * Note, end is the bytenr of last byte, so we need + 1 here.
+ */
+ if (state->state & EXTENT_DELALLOC)
+ btrfs_qgroup_free_data(inode, start, end - start + 1);
+
clear_extent_bit(io_tree, start, end,
EXTENT_LOCKED | EXTENT_DIRTY |
EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
@@ -8592,6 +8604,18 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
}
}
+ /*
+ * Qgroup reserved space handler
+ * Page here will be either
+ * 1) Already written to disk
+ * In this case, its reserved space is released from data rsv map
+ * and will be freed by delayed_ref handler finally.
+ * So even we call qgroup_free_data(), it won't decrease reserved
+ * space.
+ * 2) Not written to disk
+ * This means the reserved space should be freed here.
+ */
+ btrfs_qgroup_free_data(inode, page_start, PAGE_CACHE_SIZE);
if (!inode_evicting) {
clear_extent_bit(tree, page_start, page_end,
EXTENT_LOCKED | EXTENT_DIRTY |
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 20/23] btrfs: qgroup: Add new trace point for qgroup data reserve
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (17 preceding siblings ...)
2015-10-09 2:30 ` [PATCH v2 19/23] btrfs: Add handler for invalidate page Qu Wenruo
@ 2015-10-09 2:34 ` Qu Wenruo
2015-10-09 2:34 ` [PATCH v2 21/23] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
` (4 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:34 UTC (permalink / raw)
To: linux-btrfs
Now each qgroup reserve for data will has its ftrace event for better
debugging.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Newly introduced
---
fs/btrfs/qgroup.c | 15 +++++-
fs/btrfs/qgroup.h | 8 +++
include/trace/events/btrfs.h | 113 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 134 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 6f397ce..54ba9fc 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2818,6 +2818,7 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
struct btrfs_qgroup_data_rsv_map *reserve_map;
struct data_rsv_range *tmp = NULL;
struct ulist *insert_list;
+ u64 reserved = 0;
int ret;
if (!root->fs_info->quota_enabled || !is_fstree(root->objectid) ||
@@ -2841,7 +2842,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
spin_lock(&reserve_map->lock);
ret = reserve_data_range(root, reserve_map, tmp, insert_list, start,
- len, NULL);
+ len, &reserved);
+ trace_btrfs_qgroup_reserve_data(inode, start, len, reserved,
+ QGROUP_RESERVE);
/*
* For error and already exists case, free tmp memory.
* For tmp used case, set ret to 0, as some careless
@@ -2995,6 +2998,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
struct data_rsv_range *tmp;
struct btrfs_qgroup_data_rsv_map *map;
u64 reserved = 0;
+ int trace_op = QGROUP_RELEASE;
int ret;
spin_lock(&BTRFS_I(inode)->qgroup_init_lock);
@@ -3011,8 +3015,11 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
/* release_data_range() won't fail only check if memory is used */
if (ret == 0)
kfree(tmp);
- if (free_reserved)
+ if (free_reserved) {
qgroup_free(BTRFS_I(inode)->root, reserved);
+ trace_op = QGROUP_FREE;
+ }
+ trace_btrfs_qgroup_release_data(inode, start, len, reserved, trace_op);
spin_unlock(&map->lock);
return 0;
}
@@ -3084,6 +3091,7 @@ int btrfs_qgroup_init_data_rsv_map(struct inode *inode)
}
binode->qgroup_rsv_map = dirty_map;
out:
+ trace_btrfs_qgroup_init_data_rsv_map(inode, 0);
spin_unlock(&binode->qgroup_init_lock);
return 0;
}
@@ -3094,6 +3102,7 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
struct btrfs_root *root = binode->root;
struct btrfs_qgroup_data_rsv_map *dirty_map = binode->qgroup_rsv_map;
struct rb_node *node;
+ u64 free_reserved = 0;
/*
* this function is called at inode destroy routine, so no concurrency
@@ -3108,6 +3117,7 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
/* Reserve map should be empty, or we are leaking */
WARN_ON(dirty_map->reserved);
+ free_reserved = dirty_map->reserved;
qgroup_free(root, dirty_map->reserved);
spin_lock(&dirty_map->lock);
while ((node = rb_first(&dirty_map->root)) != NULL) {
@@ -3121,6 +3131,7 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
rb_erase(node, &dirty_map->root);
kfree(range);
}
+ trace_btrfs_qgroup_free_data_rsv_map(inode, free_reserved);
spin_unlock(&dirty_map->lock);
kfree(dirty_map);
binode->qgroup_rsv_map = NULL;
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 3f6ad43..cd3e515 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -33,6 +33,13 @@ struct btrfs_qgroup_extent_record {
struct ulist *old_roots;
};
+/*
+ * For qgroup event trace points only
+ */
+#define QGROUP_RESERVE (1<<0)
+#define QGROUP_RELEASE (1<<1)
+#define QGROUP_FREE (1<<2)
+
/* For per-inode dirty range reserve */
struct btrfs_qgroup_data_rsv_map;
@@ -84,6 +91,7 @@ static inline void btrfs_qgroup_free_delayed_ref(struct btrfs_fs_info *fs_info,
u64 ref_root, u64 num_bytes)
{
btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes);
+ trace_btrfs_qgroup_free_delayed_ref(ref_root, num_bytes);
}
void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 0b73af9..b4473da 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1117,6 +1117,119 @@ DEFINE_EVENT(btrfs__workqueue_done, btrfs_workqueue_destroy,
TP_ARGS(wq)
);
+DECLARE_EVENT_CLASS(btrfs__qgroup_data_map,
+
+ TP_PROTO(struct inode *inode, u64 free_reserved),
+
+ TP_ARGS(inode, free_reserved),
+
+ TP_STRUCT__entry(
+ __field( u64, rootid )
+ __field( unsigned long, ino )
+ __field( u64, free_reserved )
+ ),
+
+ TP_fast_assign(
+ __entry->rootid = BTRFS_I(inode)->root->objectid;
+ __entry->ino = inode->i_ino;
+ __entry->free_reserved = free_reserved;
+ ),
+
+ TP_printk("rootid=%llu, ino=%lu, free_reserved=%llu",
+ __entry->rootid, __entry->ino, __entry->free_reserved)
+);
+
+DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_init_data_rsv_map,
+
+ TP_PROTO(struct inode *inode, u64 free_reserved),
+
+ TP_ARGS(inode, free_reserved)
+);
+
+DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_free_data_rsv_map,
+
+ TP_PROTO(struct inode *inode, u64 free_reserved),
+
+ TP_ARGS(inode, free_reserved)
+);
+
+#define BTRFS_QGROUP_OPERATIONS \
+ { QGROUP_RESERVE, "reserve" }, \
+ { QGROUP_RELEASE, "release" }, \
+ { QGROUP_FREE, "free" }
+
+DECLARE_EVENT_CLASS(btrfs__qgroup_rsv_data,
+
+ TP_PROTO(struct inode *inode, u64 start, u64 len, u64 reserved, int op),
+
+ TP_ARGS(inode, start, len, reserved, op),
+
+ TP_STRUCT__entry(
+ __field( u64, rootid )
+ __field( unsigned long, ino )
+ __field( u64, start )
+ __field( u64, len )
+ __field( u64, reserved )
+ __field( int, op )
+ ),
+
+ TP_fast_assign(
+ __entry->rootid = BTRFS_I(inode)->root->objectid;
+ __entry->ino = inode->i_ino;
+ __entry->start = start;
+ __entry->len = len;
+ __entry->reserved = reserved;
+ __entry->op = op;
+ ),
+
+ TP_printk("root=%llu, ino=%lu, start=%llu, len=%llu, reserved=%llu, op=%s",
+ __entry->rootid, __entry->ino, __entry->start, __entry->len,
+ __entry->reserved,
+ __print_flags((unsigned long)__entry->op, "",
+ BTRFS_QGROUP_OPERATIONS)
+ )
+);
+
+DEFINE_EVENT(btrfs__qgroup_rsv_data, btrfs_qgroup_reserve_data,
+
+ TP_PROTO(struct inode *inode, u64 start, u64 len, u64 reserved, int op),
+
+ TP_ARGS(inode, start, len, reserved, op)
+);
+
+DEFINE_EVENT(btrfs__qgroup_rsv_data, btrfs_qgroup_release_data,
+
+ TP_PROTO(struct inode *inode, u64 start, u64 len, u64 reserved, int op),
+
+ TP_ARGS(inode, start, len, reserved, op)
+);
+
+DECLARE_EVENT_CLASS(btrfs__qgroup_delayed_ref,
+
+ TP_PROTO(u64 ref_root, u64 reserved),
+
+ TP_ARGS(ref_root, reserved),
+
+ TP_STRUCT__entry(
+ __field( u64, ref_root )
+ __field( u64, reserved )
+ ),
+
+ TP_fast_assign(
+ __entry->ref_root = ref_root;
+ __entry->reserved = reserved;
+ ),
+
+ TP_printk("root=%llu, reserved=%llu, op=free",
+ __entry->ref_root, __entry->reserved)
+);
+
+DEFINE_EVENT(btrfs__qgroup_delayed_ref, btrfs_qgroup_free_delayed_ref,
+
+ TP_PROTO(u64 ref_root, u64 reserved),
+
+ TP_ARGS(ref_root, reserved)
+);
#endif /* _TRACE_BTRFS_H */
/* This part must be outside protection */
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 21/23] btrfs: fallocate: Add support to accurate qgroup reserve
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (18 preceding siblings ...)
2015-10-09 2:34 ` [PATCH v2 20/23] btrfs: qgroup: Add new trace point for qgroup data reserve Qu Wenruo
@ 2015-10-09 2:34 ` Qu Wenruo
2015-10-09 2:34 ` [PATCH v2 22/23] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Qu Wenruo
` (3 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:34 UTC (permalink / raw)
To: linux-btrfs
Now fallocate will do accurate qgroup reserve space check, unlike old
method, which will always reserve the whole length of the range.
With this patch, fallocate will:
1) Iterate the desired range and mark in data rsv map
Only range which is going to be allocated will be recorded in data
rsv map and reserve the space.
For already allocated range (normal/prealloc extent) they will be
skipped.
Also, record the marked range into a new list for later use.
2) If 1) succeeded, do real file extent allocate.
And at file extent allocation time, corresponding range will be
removed from the range in data rsv map.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Fix comment typo
Add missing cleanup for falloc list
---
fs/btrfs/file.c | 159 +++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 116 insertions(+), 43 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c97b24f..d638d34 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2545,17 +2545,61 @@ out_only_mutex:
return err;
}
+/* Helper structure to record which range is already reserved */
+struct falloc_range {
+ struct list_head list;
+ u64 start;
+ u64 len;
+};
+
+/*
+ * Helper function to add falloc range
+ *
+ * Caller should have locked the larger range of extent containing
+ * [start, len)
+ */
+static int add_falloc_range(struct list_head *head, u64 start, u64 len)
+{
+ struct falloc_range *prev = NULL;
+ struct falloc_range *range = NULL;
+
+ if (list_empty(head))
+ goto insert;
+
+ /*
+ * As fallocate iterate by bytenr order, we only need to check
+ * the last range.
+ */
+ prev = list_entry(head->prev, struct falloc_range, list);
+ if (prev->start + prev->len == start) {
+ prev->len += len;
+ return 0;
+ }
+insert:
+ range = kmalloc(sizeof(*range), GFP_NOFS);
+ if (!range)
+ return -ENOMEM;
+ range->start = start;
+ range->len = len;
+ list_add_tail(&range->list, head);
+ return 0;
+}
+
static long btrfs_fallocate(struct file *file, int mode,
loff_t offset, loff_t len)
{
struct inode *inode = file_inode(file);
struct extent_state *cached_state = NULL;
+ struct falloc_range *range;
+ struct falloc_range *tmp;
+ struct list_head reserve_list;
u64 cur_offset;
u64 last_byte;
u64 alloc_start;
u64 alloc_end;
u64 alloc_hint = 0;
u64 locked_end;
+ u64 actual_end = 0;
struct extent_map *em;
int blocksize = BTRFS_I(inode)->root->sectorsize;
int ret;
@@ -2571,13 +2615,11 @@ static long btrfs_fallocate(struct file *file, int mode,
return btrfs_punch_hole(inode, offset, len);
/*
- * Make sure we have enough space before we do the
- * allocation.
- * XXX: The behavior must be changed to do accurate check first
- * and then check data reserved space.
+ * Only trigger disk allocation, don't trigger qgroup reserve
+ *
+ * For qgroup space, it will be checked later.
*/
- ret = btrfs_check_data_free_space(inode, alloc_start,
- alloc_end - alloc_start);
+ ret = btrfs_alloc_data_chunk_ondemand(inode, alloc_end - alloc_start);
if (ret)
return ret;
@@ -2586,6 +2628,13 @@ static long btrfs_fallocate(struct file *file, int mode,
if (ret)
goto out;
+ /*
+ * TODO: Move these two operations after we have checked
+ * accurate reserved space, or fallocate can still fail but
+ * with page truncated or size expanded.
+ *
+ * But that's a minor problem and won't do much harm BTW.
+ */
if (alloc_start > inode->i_size) {
ret = btrfs_cont_expand(inode, i_size_read(inode),
alloc_start);
@@ -2644,10 +2693,10 @@ static long btrfs_fallocate(struct file *file, int mode,
}
}
+ /* First, check if we exceed the qgroup limit */
+ INIT_LIST_HEAD(&reserve_list);
cur_offset = alloc_start;
while (1) {
- u64 actual_end;
-
em = btrfs_get_extent(inode, NULL, 0, cur_offset,
alloc_end - cur_offset, 0);
if (IS_ERR_OR_NULL(em)) {
@@ -2660,54 +2709,78 @@ static long btrfs_fallocate(struct file *file, int mode,
last_byte = min(extent_map_end(em), alloc_end);
actual_end = min_t(u64, extent_map_end(em), offset + len);
last_byte = ALIGN(last_byte, blocksize);
-
if (em->block_start == EXTENT_MAP_HOLE ||
(cur_offset >= inode->i_size &&
!test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) {
- ret = btrfs_prealloc_file_range(inode, mode, cur_offset,
- last_byte - cur_offset,
- 1 << inode->i_blkbits,
- offset + len,
- &alloc_hint);
- } else if (actual_end > inode->i_size &&
- !(mode & FALLOC_FL_KEEP_SIZE)) {
- struct btrfs_trans_handle *trans;
- struct btrfs_root *root = BTRFS_I(inode)->root;
-
- /*
- * We didn't need to allocate any more space, but we
- * still extended the size of the file so we need to
- * update i_size and the inode item.
- */
- trans = btrfs_start_transaction(root, 1);
- if (IS_ERR(trans)) {
- ret = PTR_ERR(trans);
- } else {
- inode->i_ctime = CURRENT_TIME;
- i_size_write(inode, actual_end);
- btrfs_ordered_update_i_size(inode, actual_end,
- NULL);
- ret = btrfs_update_inode(trans, root, inode);
- if (ret)
- btrfs_end_transaction(trans, root);
- else
- ret = btrfs_end_transaction(trans,
- root);
+ ret = add_falloc_range(&reserve_list, cur_offset,
+ last_byte - cur_offset);
+ if (ret < 0) {
+ free_extent_map(em);
+ break;
}
+ ret = btrfs_qgroup_reserve_data(inode, cur_offset,
+ last_byte - cur_offset);
+ if (ret < 0)
+ break;
}
free_extent_map(em);
- if (ret < 0)
- break;
-
cur_offset = last_byte;
- if (cur_offset >= alloc_end) {
- ret = 0;
+ if (cur_offset >= alloc_end)
break;
+ }
+
+ /*
+ * If ret is still 0, means we're OK to fallocate.
+ * Or just cleanup the list and exit.
+ */
+ list_for_each_entry_safe(range, tmp, &reserve_list, list) {
+ if (!ret)
+ ret = btrfs_prealloc_file_range(inode, mode,
+ range->start,
+ range->len, 1 << inode->i_blkbits,
+ offset + len, &alloc_hint);
+ list_del(&range->list);
+ kfree(range);
+ }
+ if (ret < 0)
+ goto out_unlock;
+
+ if (actual_end > inode->i_size &&
+ !(mode & FALLOC_FL_KEEP_SIZE)) {
+ struct btrfs_trans_handle *trans;
+ struct btrfs_root *root = BTRFS_I(inode)->root;
+
+ /*
+ * We didn't need to allocate any more space, but we
+ * still extended the size of the file so we need to
+ * update i_size and the inode item.
+ */
+ trans = btrfs_start_transaction(root, 1);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ } else {
+ inode->i_ctime = CURRENT_TIME;
+ i_size_write(inode, actual_end);
+ btrfs_ordered_update_i_size(inode, actual_end, NULL);
+ ret = btrfs_update_inode(trans, root, inode);
+ if (ret)
+ btrfs_end_transaction(trans, root);
+ else
+ ret = btrfs_end_transaction(trans, root);
}
}
+out_unlock:
unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
&cached_state, GFP_NOFS);
out:
+ /*
+ * As we waited the extent range, the data_rsv_map must be empty
+ * in the range, as written data range will be released from it.
+ * And for prealloacted extent, it will also be released when
+ * its metadata is written.
+ * So this is completely used as cleanup.
+ */
+ btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
mutex_unlock(&inode->i_mutex);
/* Let go of our reservation. */
btrfs_free_reserved_data_space(inode, alloc_start,
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 22/23] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (19 preceding siblings ...)
2015-10-09 2:34 ` [PATCH v2 21/23] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
@ 2015-10-09 2:34 ` Qu Wenruo
2015-10-09 2:34 ` [PATCH v2 23/23] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook Qu Wenruo
` (2 subsequent siblings)
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:34 UTC (permalink / raw)
To: linux-btrfs
Current code will always truncate tailing page if its alloc_start is
smaller than inode size.
This behavior will cause a lot of unneeded COW page size extent.
This patch will avoid such problem.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Newly introduced
---
fs/btrfs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d638d34..ad30b37 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2640,7 +2640,7 @@ static long btrfs_fallocate(struct file *file, int mode,
alloc_start);
if (ret)
goto out;
- } else {
+ } else if (offset + len > inode->i_size) {
/*
* If we are fallocating from the end of the file onward we
* need to zero out the end of the page if i_size lands in the
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 23/23] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (20 preceding siblings ...)
2015-10-09 2:34 ` [PATCH v2 22/23] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Qu Wenruo
@ 2015-10-09 2:34 ` Qu Wenruo
2015-10-09 4:08 ` [PATCH v2 17/23] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
2015-10-09 4:36 ` [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Josef Bacik
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 2:34 UTC (permalink / raw)
To: linux-btrfs
In clear_bit_hook, qgroup reserved data is already handled quite well,
either released by finish_ordered_io or invalidatepage.
So calling btrfs_qgroup_free_data() here is completely meaningless, and
since btrfs_qgroup_free_data() may sleep to allocate memory, it will
cause lockdep warning.
This patch will add a new function
btrfs_free_reserved_data_space_noquota() for clear_bit_hook() to cease
the lockdep warning.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
fs/btrfs/ctree.h | 2 ++
fs/btrfs/extent-tree.c | 28 ++++++++++++++++++----------
fs/btrfs/inode.c | 4 ++--
3 files changed, 22 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f20b901..3970426 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3455,6 +3455,8 @@ enum btrfs_reserve_flush_enum {
int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
+void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
+ u64 len);
void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 765f7e0..af221eb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4070,10 +4070,12 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
* Called if we need to clear a data reservation for this inode
* Normally in a error case.
*
- * This one will handle the per-indoe data rsv map for accurate reserved
- * space framework.
+ * This one will *NOT* use accurate qgroup reserved space API, just for case
+ * which we can't sleep and is sure it won't affect qgroup reserved space.
+ * Like clear_bit_hook().
*/
-void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
+ u64 len)
{
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_space_info *data_sinfo;
@@ -4083,13 +4085,6 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
round_down(start, root->sectorsize);
start = round_down(start, root->sectorsize);
- /*
- * Free any reserved qgroup data space first
- * As it will alloc memory, we can't do it with data sinfo
- * spinlock hold.
- */
- btrfs_qgroup_free_data(inode, start, len);
-
data_sinfo = root->fs_info->data_sinfo;
spin_lock(&data_sinfo->lock);
if (WARN_ON(data_sinfo->bytes_may_use < len))
@@ -4101,6 +4096,19 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
spin_unlock(&data_sinfo->lock);
}
+/*
+ * Called if we need to clear a data reservation for this inode
+ * Normally in a error case.
+ *
+ * This one will handle the per-indoe data rsv map for accurate reserved
+ * space framework.
+ */
+void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+{
+ btrfs_free_reserved_data_space_noquota(inode, start, len);
+ btrfs_qgroup_free_data(inode, start, len);
+}
+
static void force_metadata_allocation(struct btrfs_fs_info *info)
{
struct list_head *head = &info->space_info;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 85b06d1..bd3935c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1773,8 +1773,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
&& do_list && !(state->state & EXTENT_NORESERVE))
- btrfs_free_reserved_data_space(inode, state->start,
- len);
+ btrfs_free_reserved_data_space_noquota(inode,
+ state->start, len);
__percpu_counter_add(&root->fs_info->delalloc_bytes, -len,
root->fs_info->delalloc_batch);
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 17/23] btrfs: qgroup: Cleanup old inaccurate facilities
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (21 preceding siblings ...)
2015-10-09 2:34 ` [PATCH v2 23/23] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook Qu Wenruo
@ 2015-10-09 4:08 ` Qu Wenruo
2015-10-09 4:36 ` [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Josef Bacik
23 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 4:08 UTC (permalink / raw)
To: linux-btrfs
Cleanup the old facilities which use old btrfs_qgroup_reserve() function
call, replace them with the newer version, and remove the "__" prefix in
them.
Also, make btrfs_qgroup_reserve/free() functions private, as they are
now only used inside qgroup codes.
Now, the whole btrfs qgroup is swithed to use the new reserve facilities.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
Apply newly introduced functions too.
---
fs/btrfs/ctree.h | 12 ++----
fs/btrfs/extent-tree.c | 109 +++++--------------------------------------------
fs/btrfs/file.c | 15 ++++---
fs/btrfs/inode-map.c | 6 +--
fs/btrfs/inode.c | 34 +++++++--------
fs/btrfs/ioctl.c | 6 +--
fs/btrfs/qgroup.c | 19 +++++----
fs/btrfs/qgroup.h | 8 ----
fs/btrfs/relocation.c | 8 ++--
9 files changed, 61 insertions(+), 156 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4221bfd..f20b901 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3452,11 +3452,9 @@ enum btrfs_reserve_flush_enum {
BTRFS_RESERVE_FLUSH_ALL,
};
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes);
-int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
+int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
-void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
-void __btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
+void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
@@ -3472,10 +3470,8 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root,
u64 qgroup_reserved);
int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes);
-int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
-void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes);
-void __btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
+void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
unsigned short type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 32455e0..1dadbba 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3356,7 +3356,7 @@ again:
num_pages *= 16;
num_pages *= PAGE_CACHE_SIZE;
- ret = __btrfs_check_data_free_space(inode, 0, num_pages);
+ ret = btrfs_check_data_free_space(inode, 0, num_pages);
if (ret)
goto out_put;
@@ -3365,7 +3365,7 @@ again:
&alloc_hint);
if (!ret)
dcs = BTRFS_DC_SETUP;
- __btrfs_free_reserved_data_space(inode, 0, num_pages);
+ btrfs_free_reserved_data_space(inode, 0, num_pages);
out_put:
iput(inode);
@@ -4038,27 +4038,11 @@ commit_trans:
}
/*
- * This will check the space that the inode allocates from to make sure we have
- * enough space for bytes.
- */
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
-{
- struct btrfs_root *root = BTRFS_I(inode)->root;
- int ret;
-
- ret = btrfs_alloc_data_chunk_ondemand(inode, bytes);
- if (ret < 0)
- return ret;
- ret = btrfs_qgroup_reserve(root, write_bytes);
- return ret;
-}
-
-/*
* New check_data_free_space() with ability for precious data reservation
* Will replace old btrfs_check_data_free_space(), but for patch split,
* add a new function first and then replace it.
*/
-int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
{
struct btrfs_root *root = BTRFS_I(inode)->root;
int ret;
@@ -4078,33 +4062,13 @@ int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
}
/*
- * Called if we need to clear a data reservation for this inode.
- */
-void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
-{
- struct btrfs_root *root = BTRFS_I(inode)->root;
- struct btrfs_space_info *data_sinfo;
-
- /* make sure bytes are sectorsize aligned */
- bytes = ALIGN(bytes, root->sectorsize);
-
- data_sinfo = root->fs_info->data_sinfo;
- spin_lock(&data_sinfo->lock);
- WARN_ON(data_sinfo->bytes_may_use < bytes);
- data_sinfo->bytes_may_use -= bytes;
- trace_btrfs_space_reservation(root->fs_info, "space_info",
- data_sinfo->flags, bytes, 0);
- spin_unlock(&data_sinfo->lock);
-}
-
-/*
* Called if we need to clear a data reservation for this inode
* Normally in a error case.
*
* This one will handle the per-indoe data rsv map for accurate reserved
* space framework.
*/
-void __btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
{
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_space_info *data_sinfo;
@@ -5723,7 +5687,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
}
/**
- * __btrfs_delalloc_reserve_space - reserve data and metadata space for
+ * btrfs_delalloc_reserve_space - reserve data and metadata space for
* delalloc
* @inode: inode we're writing to
* @start: start range we are writing to
@@ -5747,53 +5711,21 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
* Return 0 for success
* Return <0 for error(-ENOSPC or -EQUOT)
*/
-int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
{
int ret;
- ret = __btrfs_check_data_free_space(inode, start, len);
+ ret = btrfs_check_data_free_space(inode, start, len);
if (ret < 0)
return ret;
ret = btrfs_delalloc_reserve_metadata(inode, len);
if (ret < 0)
- __btrfs_free_reserved_data_space(inode, start, len);
+ btrfs_free_reserved_data_space(inode, start, len);
return ret;
}
/**
- * btrfs_delalloc_reserve_space - reserve data and metadata space for delalloc
- * @inode: inode we're writing to
- * @num_bytes: the number of bytes we want to allocate
- *
- * This will do the following things
- *
- * o reserve space in the data space info for num_bytes
- * o reserve space in the metadata space info based on number of outstanding
- * extents and how much csums will be needed
- * o add to the inodes ->delalloc_bytes
- * o add it to the fs_info's delalloc inodes list.
- *
- * This will return 0 for success and -ENOSPC if there is no space left.
- */
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes)
-{
- int ret;
-
- ret = btrfs_check_data_free_space(inode, num_bytes, num_bytes);
- if (ret)
- return ret;
-
- ret = btrfs_delalloc_reserve_metadata(inode, num_bytes);
- if (ret) {
- btrfs_free_reserved_data_space(inode, num_bytes);
- return ret;
- }
-
- return 0;
-}
-
-/**
- * __btrfs_delalloc_release_space - release data and metadata space for delalloc
+ * btrfs_delalloc_release_space - release data and metadata space for delalloc
* @inode: inode we're releasing space for
* @start: start position of the space already reserved
* @len: the len of the space already reserved
@@ -5807,29 +5739,10 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes)
* list if there are no delalloc bytes left.
* Also it will handle the qgroup reserved space.
*/
-void __btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
+void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
{
btrfs_delalloc_release_metadata(inode, len);
- __btrfs_free_reserved_data_space(inode, start, len);
-}
-
-/**
- * btrfs_delalloc_release_space - release data and metadata space for delalloc
- * @inode: inode we're releasing space for
- * @num_bytes: the number of bytes we want to free up
- *
- * This must be matched with a call to btrfs_delalloc_reserve_space. This is
- * called in the case that we don't need the metadata AND data reservations
- * anymore. So if there is an error or we insert an inline extent.
- *
- * This function will release the metadata space that was not used and will
- * decrement ->delalloc_bytes and remove it from the fs_info delalloc_inodes
- * list if there are no delalloc bytes left.
- */
-void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes)
-{
- btrfs_delalloc_release_metadata(inode, num_bytes);
- btrfs_free_reserved_data_space(inode, num_bytes);
+ btrfs_free_reserved_data_space(inode, start, len);
}
static int update_block_group(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index bf4d5fb..c97b24f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1532,7 +1532,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
goto reserve_metadata;
}
}
- ret = __btrfs_check_data_free_space(inode, pos, write_bytes);
+ ret = btrfs_check_data_free_space(inode, pos, write_bytes);
if (ret < 0)
break;
@@ -1540,8 +1540,8 @@ reserve_metadata:
ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes);
if (ret) {
if (!only_release_metadata)
- __btrfs_free_reserved_data_space(inode, pos,
- write_bytes);
+ btrfs_free_reserved_data_space(inode, pos,
+ write_bytes);
else
btrfs_end_write_no_snapshoting(root);
break;
@@ -1611,7 +1611,7 @@ again:
btrfs_delalloc_release_metadata(inode,
release_bytes);
else
- __btrfs_delalloc_release_space(inode, pos,
+ btrfs_delalloc_release_space(inode, pos,
release_bytes);
}
@@ -1664,8 +1664,7 @@ again:
btrfs_end_write_no_snapshoting(root);
btrfs_delalloc_release_metadata(inode, release_bytes);
} else {
- __btrfs_delalloc_release_space(inode, pos,
- release_bytes);
+ btrfs_delalloc_release_space(inode, pos, release_bytes);
}
}
@@ -2711,8 +2710,8 @@ static long btrfs_fallocate(struct file *file, int mode,
out:
mutex_unlock(&inode->i_mutex);
/* Let go of our reservation. */
- __btrfs_free_reserved_data_space(inode, alloc_start,
- alloc_end - alloc_start);
+ btrfs_free_reserved_data_space(inode, alloc_start,
+ alloc_end - alloc_start);
return ret;
}
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 78bc09c..767a605 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -488,17 +488,17 @@ again:
/* Just to make sure we have enough space */
prealloc += 8 * PAGE_CACHE_SIZE;
- ret = __btrfs_delalloc_reserve_space(inode, 0, prealloc);
+ ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
if (ret)
goto out_put;
ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
prealloc, prealloc, &alloc_hint);
if (ret) {
- __btrfs_delalloc_release_space(inode, 0, prealloc);
+ btrfs_delalloc_release_space(inode, 0, prealloc);
goto out_put;
}
- __btrfs_free_reserved_data_space(inode, 0, prealloc);
+ btrfs_free_reserved_data_space(inode, 0, prealloc);
ret = btrfs_write_out_ino_cache(root, trans, path, inode);
out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 38a0fb9..84c31dd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1766,8 +1766,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
&& do_list && !(state->state & EXTENT_NORESERVE))
- __btrfs_free_reserved_data_space(inode, state->start,
- len);
+ btrfs_free_reserved_data_space(inode, state->start,
+ len);
__percpu_counter_add(&root->fs_info->delalloc_bytes, -len,
root->fs_info->delalloc_batch);
@@ -1986,8 +1986,8 @@ again:
goto again;
}
- ret = __btrfs_delalloc_reserve_space(inode, page_start,
- PAGE_CACHE_SIZE);
+ ret = btrfs_delalloc_reserve_space(inode, page_start,
+ PAGE_CACHE_SIZE);
if (ret) {
mapping_set_error(page->mapping, ret);
end_extent_writepage(page, ret, page_start, page_end);
@@ -4583,7 +4583,7 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
if ((offset & (blocksize - 1)) == 0 &&
(!len || ((len & (blocksize - 1)) == 0)))
goto out;
- ret = __btrfs_delalloc_reserve_space(inode,
+ ret = btrfs_delalloc_reserve_space(inode,
round_down(from, PAGE_CACHE_SIZE), PAGE_CACHE_SIZE);
if (ret)
goto out;
@@ -4591,7 +4591,7 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
again:
page = find_or_create_page(mapping, index, mask);
if (!page) {
- __btrfs_delalloc_release_space(inode,
+ btrfs_delalloc_release_space(inode,
round_down(from, PAGE_CACHE_SIZE),
PAGE_CACHE_SIZE);
ret = -ENOMEM;
@@ -4661,8 +4661,8 @@ again:
out_unlock:
if (ret)
- __btrfs_delalloc_release_space(inode, page_start,
- PAGE_CACHE_SIZE);
+ btrfs_delalloc_release_space(inode, page_start,
+ PAGE_CACHE_SIZE);
unlock_page(page);
page_cache_release(page);
out:
@@ -7593,7 +7593,7 @@ unlock:
spin_unlock(&BTRFS_I(inode)->lock);
}
- __btrfs_free_reserved_data_space(inode, start, len);
+ btrfs_free_reserved_data_space(inode, start, len);
WARN_ON(dio_data->reserve < len);
dio_data->reserve -= len;
current->journal_info = dio_data;
@@ -8386,7 +8386,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
mutex_unlock(&inode->i_mutex);
relock = true;
}
- ret = __btrfs_delalloc_reserve_space(inode, offset, count);
+ ret = btrfs_delalloc_reserve_space(inode, offset, count);
if (ret)
goto out;
dio_data.outstanding_extents = div64_u64(count +
@@ -8415,11 +8415,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
current->journal_info = NULL;
if (ret < 0 && ret != -EIOCBQUEUED) {
if (dio_data.reserve)
- __btrfs_delalloc_release_space(inode, offset,
- dio_data.reserve);
+ btrfs_delalloc_release_space(inode, offset,
+ dio_data.reserve);
} else if (ret >= 0 && (size_t)ret < count)
- __btrfs_delalloc_release_space(inode, offset,
- count - (size_t)ret);
+ btrfs_delalloc_release_space(inode, offset,
+ count - (size_t)ret);
}
out:
if (wakeup)
@@ -8630,8 +8630,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
page_start = page_offset(page);
page_end = page_start + PAGE_CACHE_SIZE - 1;
- ret = __btrfs_delalloc_reserve_space(inode, page_start,
- PAGE_CACHE_SIZE);
+ ret = btrfs_delalloc_reserve_space(inode, page_start,
+ PAGE_CACHE_SIZE);
if (!ret) {
ret = file_update_time(vma->vm_file);
reserved = 1;
@@ -8726,7 +8726,7 @@ out_unlock:
}
unlock_page(page);
out:
- __btrfs_delalloc_release_space(inode, page_start, PAGE_CACHE_SIZE);
+ btrfs_delalloc_release_space(inode, page_start, PAGE_CACHE_SIZE);
out_noreserve:
sb_end_pagefault(inode->i_sb);
return ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3158b0f..97aee25 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1119,7 +1119,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
- ret = __btrfs_delalloc_reserve_space(inode,
+ ret = btrfs_delalloc_reserve_space(inode,
start_index << PAGE_CACHE_SHIFT,
page_cnt << PAGE_CACHE_SHIFT);
if (ret)
@@ -1210,7 +1210,7 @@ again:
spin_lock(&BTRFS_I(inode)->lock);
BTRFS_I(inode)->outstanding_extents++;
spin_unlock(&BTRFS_I(inode)->lock);
- __btrfs_delalloc_release_space(inode,
+ btrfs_delalloc_release_space(inode,
start_index << PAGE_CACHE_SHIFT,
(page_cnt - i_done) << PAGE_CACHE_SHIFT);
}
@@ -1237,7 +1237,7 @@ out:
unlock_page(pages[i]);
page_cache_release(pages[i]);
}
- __btrfs_delalloc_release_space(inode,
+ btrfs_delalloc_release_space(inode,
start_index << PAGE_CACHE_SHIFT,
page_cnt << PAGE_CACHE_SHIFT);
return ret;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index b7f6ce1..6f397ce 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2088,7 +2088,7 @@ out:
return ret;
}
-int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
+static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
{
struct btrfs_root *quota_root;
struct btrfs_qgroup *qgroup;
@@ -2221,6 +2221,11 @@ out:
spin_unlock(&fs_info->qgroup_lock);
}
+static inline void qgroup_free(struct btrfs_root *root, u64 num_bytes)
+{
+ return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
+ num_bytes);
+}
void assert_qgroups_uptodate(struct btrfs_trans_handle *trans)
{
if (list_empty(&trans->qgroup_ref_list) && !trans->delayed_ref_elem.seq)
@@ -2783,7 +2788,7 @@ static int reserve_data_range(struct btrfs_root *root,
cur_start = next_start;
}
insert:
- ret = btrfs_qgroup_reserve(root, reserve);
+ ret = qgroup_reserve(root, reserve);
if (ret < 0)
return ret;
/* ranges must be inserted after we are sure it has enough space */
@@ -3007,7 +3012,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
if (ret == 0)
kfree(tmp);
if (free_reserved)
- btrfs_qgroup_free(BTRFS_I(inode)->root, reserved);
+ qgroup_free(BTRFS_I(inode)->root, reserved);
spin_unlock(&map->lock);
return 0;
}
@@ -3103,7 +3108,7 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
/* Reserve map should be empty, or we are leaking */
WARN_ON(dirty_map->reserved);
- btrfs_qgroup_free(root, dirty_map->reserved);
+ qgroup_free(root, dirty_map->reserved);
spin_lock(&dirty_map->lock);
while ((node = rb_first(&dirty_map->root)) != NULL) {
struct data_rsv_range *range;
@@ -3130,7 +3135,7 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes)
return 0;
BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
- ret = btrfs_qgroup_reserve(root, num_bytes);
+ ret = qgroup_reserve(root, num_bytes);
if (ret < 0)
return ret;
atomic_add(num_bytes, &root->qgroup_meta_rsv);
@@ -3147,7 +3152,7 @@ void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
reserved = atomic_xchg(&root->qgroup_meta_rsv, 0);
if (reserved == 0)
return;
- btrfs_qgroup_free(root, reserved);
+ qgroup_free(root, reserved);
}
void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
@@ -3158,5 +3163,5 @@ void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
WARN_ON(atomic_read(&root->qgroup_meta_rsv) < num_bytes);
atomic_sub(num_bytes, &root->qgroup_meta_rsv);
- btrfs_qgroup_free(root, num_bytes);
+ qgroup_free(root, num_bytes);
}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 47d75cb..3f6ad43 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -74,15 +74,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
struct btrfs_qgroup_inherit *inherit);
-int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
u64 ref_root, u64 num_bytes);
-static inline void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
-{
- return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
- num_bytes);
-}
-
/*
* TODO: Add proper trace point for it, as btrfs_qgroup_free() is
* called by everywhere, can't provide good trace for delayed ref case.
@@ -92,7 +85,6 @@ static inline void btrfs_qgroup_free_delayed_ref(struct btrfs_fs_info *fs_info,
{
btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes);
}
-
void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f4621c5..f823276 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3034,8 +3034,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
BUG_ON(cluster->start != cluster->boundary[0]);
mutex_lock(&inode->i_mutex);
- ret = __btrfs_check_data_free_space(inode, cluster->start,
- cluster->end + 1 - cluster->start);
+ ret = btrfs_check_data_free_space(inode, cluster->start,
+ cluster->end + 1 - cluster->start);
if (ret)
goto out;
@@ -3056,8 +3056,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
break;
nr++;
}
- __btrfs_free_reserved_data_space(inode, cluster->start,
- cluster->end + 1 - cluster->start);
+ btrfs_free_reserved_data_space(inode, cluster->start,
+ cluster->end + 1 - cluster->start);
out:
mutex_unlock(&inode->i_mutex);
return ret;
--
2.6.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH v2 00/23] Rework btrfs qgroup reserved space framework
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
` (22 preceding siblings ...)
2015-10-09 4:08 ` [PATCH v2 17/23] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
@ 2015-10-09 4:36 ` Josef Bacik
2015-10-09 5:45 ` Qu Wenruo
23 siblings, 1 reply; 28+ messages in thread
From: Josef Bacik @ 2015-10-09 4:36 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 10/08/2015 07:11 PM, Qu Wenruo wrote:
> In previous rework of qgroup, we succeeded in fixing qgroup accounting
> part, making the rfer/excl numbers accurate.
>
> But that's just part of qgroup work, another part of qgroup still has
> quite a lot problem, that's qgroup reserve space part which will lead to
> EQUOT even we are far from the limit.
>
> [[BUG]]
> The easiest way to trigger the bug is,
> 1) Enable quota
> 2) Limit excl of qgroup 5 to 16M
> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>
> EQUOT will be triggered at about the 8th write.
> But after remount, we can still write until about 15M.
>
> [[CAUSE]]
> The problem is caused by the fact that qgroup will reserve space even
> the data space is already reserved.
>
> In above reproducer, each time we buffered write [0,2M) qgroup will
> reserve 2M space, but in fact, at the 1st time, we have already reserved
> 2M and from then on, we don't need to reserved any data space as we are
> only writing [0,2M).
>
> Also, the reserved space will only be freed *ONCE* when its backref is
> run at commit_transaction() time.
>
> That's causing the reserved space leaking.
>
> [[FIX]]
> The fix is not a simple one, as currently btrfs_qgroup_reserve() will
> allocate whatever caller asked for.
>
> So for accurate qgroup reserve, we introduce a completely new framework
> for data and metadata.
> 1) Per-inode data reserve map
> Now, each inode will have a data reserve map, recording which range
> of data is already reserved.
> If we are writing a range which is already reserved, we won't need to
> reserve space again.
>
> Also, for the fact that qgroup is only accounted at commit_trans(),
> for data commit into disc and its metadata is also inserted into
> current tree, we should free the data reserved range, but still keep
> the reserved space until commit_trans().
>
> So delayed_ref_head will have new members to record how much space is
> reserved and free them at commit_trans() time.
This is already handled by setting DELALLOC in the io_tree, we do
similar sort of stuff for the normal enospc accounting, why not use
that? Thanks,
Josef
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH v2 00/23] Rework btrfs qgroup reserved space framework
2015-10-09 4:36 ` [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Josef Bacik
@ 2015-10-09 5:45 ` Qu Wenruo
2015-10-09 6:41 ` Filipe Manana
0 siblings, 1 reply; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 5:45 UTC (permalink / raw)
To: Josef Bacik, linux-btrfs
Josef Bacik wrote on 2015/10/08 21:36 -0700:
> On 10/08/2015 07:11 PM, Qu Wenruo wrote:
>> In previous rework of qgroup, we succeeded in fixing qgroup accounting
>> part, making the rfer/excl numbers accurate.
>>
>> But that's just part of qgroup work, another part of qgroup still has
>> quite a lot problem, that's qgroup reserve space part which will lead to
>> EQUOT even we are far from the limit.
>>
>> [[BUG]]
>> The easiest way to trigger the bug is,
>> 1) Enable quota
>> 2) Limit excl of qgroup 5 to 16M
>> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>>
>> EQUOT will be triggered at about the 8th write.
>> But after remount, we can still write until about 15M.
>>
>> [[CAUSE]]
>> The problem is caused by the fact that qgroup will reserve space even
>> the data space is already reserved.
>>
>> In above reproducer, each time we buffered write [0,2M) qgroup will
>> reserve 2M space, but in fact, at the 1st time, we have already reserved
>> 2M and from then on, we don't need to reserved any data space as we are
>> only writing [0,2M).
>>
>> Also, the reserved space will only be freed *ONCE* when its backref is
>> run at commit_transaction() time.
>>
>> That's causing the reserved space leaking.
>>
>> [[FIX]]
>> The fix is not a simple one, as currently btrfs_qgroup_reserve() will
>> allocate whatever caller asked for.
>>
>> So for accurate qgroup reserve, we introduce a completely new framework
>> for data and metadata.
>> 1) Per-inode data reserve map
>> Now, each inode will have a data reserve map, recording which range
>> of data is already reserved.
>> If we are writing a range which is already reserved, we won't need to
>> reserve space again.
>>
>> Also, for the fact that qgroup is only accounted at commit_trans(),
>> for data commit into disc and its metadata is also inserted into
>> current tree, we should free the data reserved range, but still keep
>> the reserved space until commit_trans().
>>
>> So delayed_ref_head will have new members to record how much space is
>> reserved and free them at commit_trans() time.
>
> This is already handled by setting DELALLOC in the io_tree, we do
> similar sort of stuff for the normal enospc accounting, why not use
> that? Thanks,
>
> Josef
Thanks for pointing this out.
I was also searching for a existing facility, but didn't find one as I'm
not familiar with io_tree.
After a quick glance, it seems quite fit the need, but not completely sure.
I'll keep investigating on it and try to use it.
BTW, from what I understand, __btrfs_buffered_write() should cause the
range to be DEALLOC, but I didn't find any call to set_extent_delalloc(),
it that done in other place?
Thanks,
Qu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 00/23] Rework btrfs qgroup reserved space framework
2015-10-09 5:45 ` Qu Wenruo
@ 2015-10-09 6:41 ` Filipe Manana
2015-10-09 8:19 ` Qu Wenruo
0 siblings, 1 reply; 28+ messages in thread
From: Filipe Manana @ 2015-10-09 6:41 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Josef Bacik, linux-btrfs@vger.kernel.org
On Fri, Oct 9, 2015 at 6:45 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>
> Josef Bacik wrote on 2015/10/08 21:36 -0700:
>>
>> On 10/08/2015 07:11 PM, Qu Wenruo wrote:
>>>
>>> In previous rework of qgroup, we succeeded in fixing qgroup accounting
>>> part, making the rfer/excl numbers accurate.
>>>
>>> But that's just part of qgroup work, another part of qgroup still has
>>> quite a lot problem, that's qgroup reserve space part which will lead to
>>> EQUOT even we are far from the limit.
>>>
>>> [[BUG]]
>>> The easiest way to trigger the bug is,
>>> 1) Enable quota
>>> 2) Limit excl of qgroup 5 to 16M
>>> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>>>
>>> EQUOT will be triggered at about the 8th write.
>>> But after remount, we can still write until about 15M.
>>>
>>> [[CAUSE]]
>>> The problem is caused by the fact that qgroup will reserve space even
>>> the data space is already reserved.
>>>
>>> In above reproducer, each time we buffered write [0,2M) qgroup will
>>> reserve 2M space, but in fact, at the 1st time, we have already reserved
>>> 2M and from then on, we don't need to reserved any data space as we are
>>> only writing [0,2M).
>>>
>>> Also, the reserved space will only be freed *ONCE* when its backref is
>>> run at commit_transaction() time.
>>>
>>> That's causing the reserved space leaking.
>>>
>>> [[FIX]]
>>> The fix is not a simple one, as currently btrfs_qgroup_reserve() will
>>> allocate whatever caller asked for.
>>>
>>> So for accurate qgroup reserve, we introduce a completely new framework
>>> for data and metadata.
>>> 1) Per-inode data reserve map
>>> Now, each inode will have a data reserve map, recording which range
>>> of data is already reserved.
>>> If we are writing a range which is already reserved, we won't need to
>>> reserve space again.
>>>
>>> Also, for the fact that qgroup is only accounted at commit_trans(),
>>> for data commit into disc and its metadata is also inserted into
>>> current tree, we should free the data reserved range, but still keep
>>> the reserved space until commit_trans().
>>>
>>> So delayed_ref_head will have new members to record how much space is
>>> reserved and free them at commit_trans() time.
>>
>>
>> This is already handled by setting DELALLOC in the io_tree, we do
>> similar sort of stuff for the normal enospc accounting, why not use
>> that? Thanks,
>>
>> Josef
>
>
> Thanks for pointing this out.
>
> I was also searching for a existing facility, but didn't find one as I'm not
> familiar with io_tree.
>
> After a quick glance, it seems quite fit the need, but not completely sure.
>
> I'll keep investigating on it and try to use it.
>
> BTW, from what I understand, __btrfs_buffered_write() should cause the range
> to be DEALLOC, but I didn't find any call to set_extent_delalloc(),
> it that done in other place?
__btrfs_buffered_write() -> btrfs_dirty_pages() -> btrfs_set_extent_delalloc()
>
> Thanks,
> Qu
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 00/23] Rework btrfs qgroup reserved space framework
2015-10-09 6:41 ` Filipe Manana
@ 2015-10-09 8:19 ` Qu Wenruo
0 siblings, 0 replies; 28+ messages in thread
From: Qu Wenruo @ 2015-10-09 8:19 UTC (permalink / raw)
To: fdmanana, Josef Bacik; +Cc: linux-btrfs@vger.kernel.org
Filipe Manana wrote on 2015/10/09 07:41 +0100:
> On Fri, Oct 9, 2015 at 6:45 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>> Josef Bacik wrote on 2015/10/08 21:36 -0700:
>>>
>>> On 10/08/2015 07:11 PM, Qu Wenruo wrote:
>>>>
>>>> In previous rework of qgroup, we succeeded in fixing qgroup accounting
>>>> part, making the rfer/excl numbers accurate.
>>>>
>>>> But that's just part of qgroup work, another part of qgroup still has
>>>> quite a lot problem, that's qgroup reserve space part which will lead to
>>>> EQUOT even we are far from the limit.
>>>>
>>>> [[BUG]]
>>>> The easiest way to trigger the bug is,
>>>> 1) Enable quota
>>>> 2) Limit excl of qgroup 5 to 16M
>>>> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>>>>
>>>> EQUOT will be triggered at about the 8th write.
>>>> But after remount, we can still write until about 15M.
>>>>
>>>> [[CAUSE]]
>>>> The problem is caused by the fact that qgroup will reserve space even
>>>> the data space is already reserved.
>>>>
>>>> In above reproducer, each time we buffered write [0,2M) qgroup will
>>>> reserve 2M space, but in fact, at the 1st time, we have already reserved
>>>> 2M and from then on, we don't need to reserved any data space as we are
>>>> only writing [0,2M).
>>>>
>>>> Also, the reserved space will only be freed *ONCE* when its backref is
>>>> run at commit_transaction() time.
>>>>
>>>> That's causing the reserved space leaking.
>>>>
>>>> [[FIX]]
>>>> The fix is not a simple one, as currently btrfs_qgroup_reserve() will
>>>> allocate whatever caller asked for.
>>>>
>>>> So for accurate qgroup reserve, we introduce a completely new framework
>>>> for data and metadata.
>>>> 1) Per-inode data reserve map
>>>> Now, each inode will have a data reserve map, recording which range
>>>> of data is already reserved.
>>>> If we are writing a range which is already reserved, we won't need to
>>>> reserve space again.
>>>>
>>>> Also, for the fact that qgroup is only accounted at commit_trans(),
>>>> for data commit into disc and its metadata is also inserted into
>>>> current tree, we should free the data reserved range, but still keep
>>>> the reserved space until commit_trans().
>>>>
>>>> So delayed_ref_head will have new members to record how much space is
>>>> reserved and free them at commit_trans() time.
>>>
>>>
>>> This is already handled by setting DELALLOC in the io_tree, we do
>>> similar sort of stuff for the normal enospc accounting, why not use
>>> that? Thanks,
>>>
>>> Josef
>>
>>
>> Thanks for pointing this out.
>>
>> I was also searching for a existing facility, but didn't find one as I'm not
>> familiar with io_tree.
>>
>> After a quick glance, it seems quite fit the need, but not completely sure.
>>
>> I'll keep investigating on it and try to use it.
>>
>> BTW, from what I understand, __btrfs_buffered_write() should cause the range
>> to be DEALLOC, but I didn't find any call to set_extent_delalloc(),
>> it that done in other place?
>
> __btrfs_buffered_write() -> btrfs_dirty_pages() -> btrfs_set_extent_delalloc()
>
Thanks,
I also find the call sequence by dump_stack.
And to Josef, after some reading, the timing of clearing DELALLOC is not
perfect for qgroup case.
For buffered/mapped write case, the difference is accept, as DELLAOC is
marked at buffered write or page mkwrite.
Only clear DEALLOC is a little early at cow_file_range() other than
finish_ordered_io() in my patchset.
The difference is acceptable for that case.
But if using DELALLOC flag, we can't handle fallocate() as it doesn't
use DELALLOC at all.
Current | Patchset
btrfs_fallocate() |btrfs_fallocate()
*NO* DELALLOC flag set/claer |-> btrfs_qgroup_reserve()
| -> reserve qgroup space
| for each needed range.
|-> btrfs_prealloc_file_range()
| -> free qgroup space
So at least extra extent flag is needed for accurate qgroup reserve.
But still thanks a lot, as I can now reuse io_tree to do such operation
other than hand coding over 1K lines of new code.
Thanks,
Qu
>>
>> Thanks,
>> Qu
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread