linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota.
       [not found] <BA68E495-8B7B-4F0F-A64B-0413B1480B9C@free.fr>
@ 2014-12-09 11:27 ` Dongsheng Yang
  2014-12-09 11:27   ` [PATCH 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
  2014-12-09 15:42   ` [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota Josef Bacik
  0 siblings, 2 replies; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-09 11:27 UTC (permalink / raw)
  To: cyril.scetbon, jbacik; +Cc: linux-btrfs, Dongsheng Yang

When we exceed quota limit in writing, we will free
some reserved extent when we need to drop but not free
account in qgroup. It means, each time we exceed quota
in writing, there will be some remain space in qg->reserved
we can not use any more. If things go on like this, the
all space will be ate up.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a84e00d..014b7f2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5262,8 +5262,11 @@ out_fail:
 			to_free = 0;
 	}
 	spin_unlock(&BTRFS_I(inode)->lock);
-	if (dropped)
+	if (dropped) {
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_free(root, dropped * root->nodesize);
 		to_free += btrfs_calc_trans_metadata_size(root, dropped);
+	}
 
 	if (to_free) {
 		btrfs_block_rsv_release(root, block_rsv, to_free);
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
  2014-12-09 11:27 ` [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota Dongsheng Yang
@ 2014-12-09 11:27   ` Dongsheng Yang
  2014-12-09 15:55     ` Josef Bacik
  2014-12-09 15:42   ` [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota Josef Bacik
  1 sibling, 1 reply; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-09 11:27 UTC (permalink / raw)
  To: cyril.scetbon, jbacik; +Cc: linux-btrfs, Dongsheng Yang

Currently, for pre_alloc or delay_alloc, the bytes will be accounted
in space_info by the three guys.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
But on the other hand, in qgroup, there are only two counters to account the
bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
bytes in space_info->bytes_may_use and qg->excl accounts bytes in
space_info->used. So the bytes in space_info->reserved is not accounted
in qgroup. If so, there is a window we can exceed the quota limit when
bytes is in space_info->reserved.

Example:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer     excl     max_rfer max_excl parent  child
-------- ----     ----     -------- -------- ------  -----
0/5      20987904 20987904 0        10485760 ---     ---

qg->excl is 20987904 larger than max_excl 10485760.

This patch introduce a new counter named may_use to qgroup, then
there are three counters in qgroup to account bytes in space_info
as below.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
qgroup->may_use           --- qgroup->reserved     --- qgroup->excl

With this patch applied:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer    excl    max_rfer max_excl parent  child
-------- ----    ----    -------- -------- ------  -----
0/5      9453568 9453568 0        10485760 ---     ---

Reported-by: Cyril SCETBON <cyril.scetbon@free.fr>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 25 ++++++++++++++++++-
 fs/btrfs/inode.c       | 22 +++++++++++++++-
 fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/qgroup.h      |  4 +++
 4 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 014b7f2..9eaf268 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5500,8 +5500,13 @@ static int pin_down_extent(struct btrfs_root *root,
 
 	set_extent_dirty(root->fs_info->pinned_extents, bytenr,
 			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
-	if (reserved)
+	if (reserved) {
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_update_reserved_bytes(root->fs_info,
+							   root->root_key.objectid,
+							   num_bytes, -1);
 		trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
+	}
 	return 0;
 }
 
@@ -6230,6 +6235,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
 		trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
 		pin = 0;
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_update_reserved_bytes(root->fs_info,
+							   root->root_key.objectid,
+							   buf->len, -1);
 	}
 out:
 	if (pin)
@@ -6964,7 +6973,12 @@ static int __btrfs_free_reserved_extent(struct btrfs_root *root,
 	else {
 		btrfs_add_free_space(cache, start, len);
 		btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_update_reserved_bytes(root->fs_info,
+							   root->root_key.objectid,
+							   len, -1);
 	}
+
 	btrfs_put_block_group(cache);
 
 	trace_btrfs_reserved_extent_free(root, start, len);
@@ -7200,6 +7214,10 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 	BUG_ON(ret); /* logic error */
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
 					 0, owner, offset, ins, 1);
+	if (root->fs_info->quota_enabled)
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins->offset, 1);
 	btrfs_put_block_group(block_group);
 	return ret;
 }
@@ -7346,6 +7364,11 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
 		return ERR_PTR(ret);
 	}
 
+	if (root->fs_info->quota_enabled)
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root_objectid,
+						   ins.offset, 1);
+
 	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
 				    blocksize, level);
 	BUG_ON(IS_ERR(buf)); /* -ENOMEM */
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d23362f..4faf794 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -59,6 +59,7 @@
 #include "backref.h"
 #include "hash.h"
 #include "props.h"
+#include "qgroup.h"
 
 struct btrfs_iget_args {
 	struct btrfs_key *location;
@@ -739,7 +740,10 @@ retry:
 			}
 			goto out_free;
 		}
-
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_update_reserved_bytes(root->fs_info,
+							   root->root_key.objectid,
+							   ins.offset, 1);
 		/*
 		 * here we're doing allocation and writeback of the
 		 * compressed pages
@@ -951,6 +955,11 @@ static noinline int cow_file_range(struct inode *inode,
 		if (ret < 0)
 			goto out_unlock;
 
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_update_reserved_bytes(root->fs_info,
+							   root->root_key.objectid,
+							   ins.offset, 1);
+
 		em = alloc_extent_map();
 		if (!em) {
 			ret = -ENOMEM;
@@ -6745,6 +6754,11 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
 		return ERR_PTR(ret);
 	}
 
+	if (root->fs_info->quota_enabled)
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
+
 	return em;
 }
 
@@ -9266,6 +9280,12 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 				btrfs_end_transaction(trans, root);
 			break;
 		}
+
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_update_reserved_bytes(root->fs_info,
+							   root->root_key.objectid,
+							   ins.offset, 1);
+
 		btrfs_drop_extent_cache(inode, cur_offset,
 					cur_offset + ins.offset -1, 0);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48b60db..d147082 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -72,6 +72,7 @@ struct btrfs_qgroup {
 	/*
 	 * reservation tracking
 	 */
+	u64 may_use;
 	u64 reserved;
 
 	/*
@@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 	WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
 	qgroup->excl += sign * oper->num_bytes;
 	qgroup->excl_cmpr += sign * oper->num_bytes;
+	if (sign > 0)
+		qgroup->reserved -= oper->num_bytes;
 
 	qgroup_dirty(fs_info, qgroup);
 
@@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 		qgroup->excl += sign * oper->num_bytes;
 		if (sign < 0)
 			WARN_ON(qgroup->excl < oper->num_bytes);
+		if (sign > 0)
+			qgroup->reserved -= oper->num_bytes;
 		qgroup->excl_cmpr += sign * oper->num_bytes;
 		qgroup_dirty(fs_info, qgroup);
 
@@ -2359,6 +2364,61 @@ out:
 	return ret;
 }
 
+int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
+					    u64 ref_root,
+					    u64 num_bytes,
+					    int sign)
+{
+	struct btrfs_root *quota_root;
+	struct btrfs_qgroup *qgroup;
+	int ret = 0;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+
+	if (!is_fstree(ref_root) || !fs_info->quota_enabled)
+		return 0;
+
+	if (num_bytes == 0)
+		return 0;
+
+	spin_lock(&fs_info->qgroup_lock);
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		goto out;
+
+	qgroup = find_qgroup_rb(fs_info, ref_root);
+	if (!qgroup)
+		goto out;
+
+	ulist_reinit(fs_info->qgroup_ulist);
+	ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
+			(uintptr_t)qgroup, GFP_ATOMIC);
+	if (ret < 0)
+		goto out;
+
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
+		struct btrfs_qgroup *qg;
+		struct btrfs_qgroup_list *glist;
+
+		qg = u64_to_ptr(unode->aux);
+
+		qg->reserved += sign * num_bytes;
+
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ret = ulist_add(fs_info->qgroup_ulist,
+					glist->group->qgroupid,
+					(uintptr_t)glist->group, GFP_ATOMIC);
+			if (ret < 0)
+				goto out;
+		}
+	}
+
+out:
+	spin_unlock(&fs_info->qgroup_lock);
+	return ret;
+}
+
 /*
  * reserve some space for a qgroup and all its parents. The reservation takes
  * place with start_transaction or dealloc_reserve, similar to ENOSPC
@@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 		qg = u64_to_ptr(unode->aux);
 
 		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
-		    qg->reserved + (s64)qg->rfer + num_bytes >
+		    qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
 		    qg->max_rfer) {
 			ret = -EDQUOT;
 			goto out;
 		}
 
 		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
-		    qg->reserved + (s64)qg->excl + num_bytes >
+		    qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
 		    qg->max_excl) {
 			ret = -EDQUOT;
 			goto out;
@@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 
 		qg = u64_to_ptr(unode->aux);
 
-		qg->reserved += num_bytes;
+		qg->may_use += num_bytes;
 	}
 
 out:
@@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
 
 		qg = u64_to_ptr(unode->aux);
 
-		qg->reserved -= num_bytes;
+		qg->may_use -= num_bytes;
 
 		list_for_each_entry(glist, &qg->groups, next_group) {
 			ret = ulist_add(fs_info->qgroup_ulist,
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 18cc68c..31de88e 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
 			 struct btrfs_qgroup_inherit *inherit);
+int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
+				       u64 ref_root,
+				       u64 num_bytes,
+				       int sign);
 int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
 void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
 
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota.
  2014-12-09 11:27 ` [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota Dongsheng Yang
  2014-12-09 11:27   ` [PATCH 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
@ 2014-12-09 15:42   ` Josef Bacik
  2014-12-10  8:09     ` Dongsheng Yang
  1 sibling, 1 reply; 20+ messages in thread
From: Josef Bacik @ 2014-12-09 15:42 UTC (permalink / raw)
  To: Dongsheng Yang, cyril.scetbon; +Cc: linux-btrfs

On 12/09/2014 06:27 AM, Dongsheng Yang wrote:
> When we exceed quota limit in writing, we will free
> some reserved extent when we need to drop but not free
> account in qgroup. It means, each time we exceed quota
> in writing, there will be some remain space in qg->reserved
> we can not use any more. If things go on like this, the
> all space will be ate up.
>
> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
> ---
>   fs/btrfs/extent-tree.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a84e00d..014b7f2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5262,8 +5262,11 @@ out_fail:
>   			to_free = 0;
>   	}
>   	spin_unlock(&BTRFS_I(inode)->lock);
> -	if (dropped)
> +	if (dropped) {
> +		if (root->fs_info->quota_enabled)
> +			btrfs_qgroup_free(root, dropped * root->nodesize);

This needs to be num_bytes + dropped * root->nodesize.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
  2014-12-09 11:27   ` [PATCH 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
@ 2014-12-09 15:55     ` Josef Bacik
  2014-12-10  8:09       ` Dongsheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Josef Bacik @ 2014-12-09 15:55 UTC (permalink / raw)
  To: Dongsheng Yang, cyril.scetbon; +Cc: linux-btrfs

On 12/09/2014 06:27 AM, Dongsheng Yang wrote:
> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
> in space_info by the three guys.
> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
> But on the other hand, in qgroup, there are only two counters to account the
> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
> space_info->used. So the bytes in space_info->reserved is not accounted
> in qgroup. If so, there is a window we can exceed the quota limit when
> bytes is in space_info->reserved.
>
> Example:
> 	# btrfs quota enable /mnt
> 	# btrfs qgroup limit -e 10M /mnt
> 	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
> 	# sync
> 	# btrfs qgroup show -pcre /mnt
> qgroupid rfer     excl     max_rfer max_excl parent  child
> -------- ----     ----     -------- -------- ------  -----
> 0/5      20987904 20987904 0        10485760 ---     ---
>
> qg->excl is 20987904 larger than max_excl 10485760.
>
> This patch introduce a new counter named may_use to qgroup, then
> there are three counters in qgroup to account bytes in space_info
> as below.
> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>
> With this patch applied:
> 	# btrfs quota enable /mnt
> 	# btrfs qgroup limit -e 10M /mnt
> 	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
> 	# sync
> 	# btrfs qgroup show -pcre /mnt
> qgroupid rfer    excl    max_rfer max_excl parent  child
> -------- ----    ----    -------- -------- ------  -----
> 0/5      9453568 9453568 0        10485760 ---     ---
>
> Reported-by: Cyril SCETBON <cyril.scetbon@free.fr>
> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
> ---
>   fs/btrfs/extent-tree.c | 25 ++++++++++++++++++-
>   fs/btrfs/inode.c       | 22 +++++++++++++++-
>   fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
>   fs/btrfs/qgroup.h      |  4 +++
>   4 files changed, 113 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 014b7f2..9eaf268 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5500,8 +5500,13 @@ static int pin_down_extent(struct btrfs_root *root,
>
>   	set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>   			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
> -	if (reserved)
> +	if (reserved) {
> +		if (root->fs_info->quota_enabled)

You already have this check in btrfs_qgroup_update_reserved_bytes, just 
call it unconditionally everywhere in this patch.  Otherwise this looks 
good, thanks,

Josef

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota.
  2014-12-09 15:42   ` [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota Josef Bacik
@ 2014-12-10  8:09     ` Dongsheng Yang
  2014-12-11 18:25       ` Josef Bacik
  0 siblings, 1 reply; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-10  8:09 UTC (permalink / raw)
  To: Josef Bacik, cyril.scetbon; +Cc: linux-btrfs

On 12/09/2014 11:42 PM, Josef Bacik wrote:
> On 12/09/2014 06:27 AM, Dongsheng Yang wrote:
>> When we exceed quota limit in writing, we will free
>> some reserved extent when we need to drop but not free
>> account in qgroup. It means, each time we exceed quota
>> in writing, there will be some remain space in qg->reserved
>> we can not use any more. If things go on like this, the
>> all space will be ate up.
>>
>> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
>> ---
>>   fs/btrfs/extent-tree.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index a84e00d..014b7f2 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5262,8 +5262,11 @@ out_fail:
>>               to_free = 0;
>>       }
>>       spin_unlock(&BTRFS_I(inode)->lock);
>> -    if (dropped)
>> +    if (dropped) {
>> +        if (root->fs_info->quota_enabled)
>> +            btrfs_qgroup_free(root, dropped * root->nodesize);
>
> This needs to be num_bytes + dropped * root->nodesize.  Thanks,

Let me try to explain why it did not free num_bytes here.

In out_fail, we did not reserve num_bytes in qgroup successfully, then 
we do not need
to free it in out_fail.

The problem this patch attempts to solve is that, when we run into 
out_fail here,
we will drop a outstanding extent. That said, in out_fail here, the 
extra reserved
nodesize for some extents should be freed.

Example:
     1). BTRFS_I(inode)->reserved_extents: 2, 
BTRFS_I(inode)->outstanding_extents: 1.
           In this case, we go intobtrfs_delalloc_reserve_metadata(). 
outstanding_extents
           will be increased at first. then 
BTRFS_I(inode)->outstanding_extents is 2.
           If we want to reserve space and failed. it will goto out_fail.
     2). In out_failed: reserved_extents is 2, outstanding_extents is 2. 
we will get a dropped of 1
          from dropping_outstanding_extent(). And now, 
reserved_extents:1, outstanding_extents:1.

In step 2, we just decrease the reserved_extents without freeing the 
related nodesize in qgroup
at the same time. So it will cause the problem I described in changelog 
which will eat the space.

Therefore, this patch here will free the nodesize related with the 
dropped extents in step 2.
About the num_bytes, as we did not reserve it successfully, no need to 
free it.

With my poor english, there must be something confusing in my 
description. Please correct me
if anything is wrong or not-good-explained.

Thanx
Yang
>
> Josef
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
  2014-12-09 15:55     ` Josef Bacik
@ 2014-12-10  8:09       ` Dongsheng Yang
  0 siblings, 0 replies; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-10  8:09 UTC (permalink / raw)
  To: Josef Bacik, cyril.scetbon; +Cc: linux-btrfs

On 12/09/2014 11:55 PM, Josef Bacik wrote:
> On 12/09/2014 06:27 AM, Dongsheng Yang wrote:
>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>> in space_info by the three guys.
>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>> But on the other hand, in qgroup, there are only two counters to 
>> account the
>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>> space_info->used. So the bytes in space_info->reserved is not accounted
>> in qgroup. If so, there is a window we can exceed the quota limit when
>> bytes is in space_info->reserved.
>>
>> Example:
>>     # btrfs quota enable /mnt
>>     # btrfs qgroup limit -e 10M /mnt
>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>     # sync
>>     # btrfs qgroup show -pcre /mnt
>> qgroupid rfer     excl     max_rfer max_excl parent  child
>> -------- ----     ----     -------- -------- ------  -----
>> 0/5      20987904 20987904 0        10485760 ---     ---
>>
>> qg->excl is 20987904 larger than max_excl 10485760.
>>
>> This patch introduce a new counter named may_use to qgroup, then
>> there are three counters in qgroup to account bytes in space_info
>> as below.
>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>
>> With this patch applied:
>>     # btrfs quota enable /mnt
>>     # btrfs qgroup limit -e 10M /mnt
>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>>     # sync
>>     # btrfs qgroup show -pcre /mnt
>> qgroupid rfer    excl    max_rfer max_excl parent  child
>> -------- ----    ----    -------- -------- ------  -----
>> 0/5      9453568 9453568 0        10485760 ---     ---
>>
>> Reported-by: Cyril SCETBON <cyril.scetbon@free.fr>
>> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
>> ---
>>   fs/btrfs/extent-tree.c | 25 ++++++++++++++++++-
>>   fs/btrfs/inode.c       | 22 +++++++++++++++-
>>   fs/btrfs/qgroup.c      | 68 
>> +++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/btrfs/qgroup.h      |  4 +++
>>   4 files changed, 113 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 014b7f2..9eaf268 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5500,8 +5500,13 @@ static int pin_down_extent(struct btrfs_root 
>> *root,
>>
>>       set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>                bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>> -    if (reserved)
>> +    if (reserved) {
>> +        if (root->fs_info->quota_enabled)
>
> You already have this check in btrfs_qgroup_update_reserved_bytes, 
> just call it unconditionally everywhere in this patch.  Otherwise this 
> looks good, thanks,

Thanx, I will update it in V2.
>
> Josef
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota.
  2014-12-10  8:09     ` Dongsheng Yang
@ 2014-12-11 18:25       ` Josef Bacik
  2014-12-12  0:44         ` Dongsheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Josef Bacik @ 2014-12-11 18:25 UTC (permalink / raw)
  To: Dongsheng Yang, cyril.scetbon; +Cc: linux-btrfs

On 12/10/2014 03:09 AM, Dongsheng Yang wrote:
> On 12/09/2014 11:42 PM, Josef Bacik wrote:
>> On 12/09/2014 06:27 AM, Dongsheng Yang wrote:
>>> When we exceed quota limit in writing, we will free
>>> some reserved extent when we need to drop but not free
>>> account in qgroup. It means, each time we exceed quota
>>> in writing, there will be some remain space in qg->reserved
>>> we can not use any more. If things go on like this, the
>>> all space will be ate up.
>>>
>>> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
>>> ---
>>>   fs/btrfs/extent-tree.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index a84e00d..014b7f2 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5262,8 +5262,11 @@ out_fail:
>>>               to_free = 0;
>>>       }
>>>       spin_unlock(&BTRFS_I(inode)->lock);
>>> -    if (dropped)
>>> +    if (dropped) {
>>> +        if (root->fs_info->quota_enabled)
>>> +            btrfs_qgroup_free(root, dropped * root->nodesize);
>>
>> This needs to be num_bytes + dropped * root->nodesize.  Thanks,
>
> Let me try to explain why it did not free num_bytes here.
>
> In out_fail, we did not reserve num_bytes in qgroup successfully, then
> we do not need
> to free it in out_fail.
>
> The problem this patch attempts to solve is that, when we run into
> out_fail here,
> we will drop a outstanding extent. That said, in out_fail here, the
> extra reserved
> nodesize for some extents should be freed.
>
> Example:
>      1). BTRFS_I(inode)->reserved_extents: 2,
> BTRFS_I(inode)->outstanding_extents: 1.
>            In this case, we go intobtrfs_delalloc_reserve_metadata().
> outstanding_extents
>            will be increased at first. then
> BTRFS_I(inode)->outstanding_extents is 2.
>            If we want to reserve space and failed. it will goto out_fail.
>      2). In out_failed: reserved_extents is 2, outstanding_extents is 2.
> we will get a dropped of 1
>           from dropping_outstanding_extent(). And now,
> reserved_extents:1, outstanding_extents:1.
>
> In step 2, we just decrease the reserved_extents without freeing the
> related nodesize in qgroup
> at the same time. So it will cause the problem I described in changelog
> which will eat the space.
>
> Therefore, this patch here will free the nodesize related with the
> dropped extents in step 2.
> About the num_bytes, as we did not reserve it successfully, no need to
> free it.
>
> With my poor english, there must be something confusing in my
> description. Please correct me
> if anything is wrong or not-good-explained.
>

So none of this made sense, I went back and read the code just now and 
this was the part I was missing

if (unlikely(ret)) {
         if (root->fs_info->quota_enabled)
                 btrfs_qgroup_free(root, num_bytes +
                                         nr_extents * root->nodesize);
         goto out_fail;
}

All you needed to say was we already free up what we reserved, we're 
just dropping anything we needed to free because of 
drop_outstanding_extents ;).  You can add

Reviewed-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota.
  2014-12-11 18:25       ` Josef Bacik
@ 2014-12-12  0:44         ` Dongsheng Yang
  2014-12-12  8:44           ` [PATCH v2 " Dongsheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-12  0:44 UTC (permalink / raw)
  To: Josef Bacik, cyril.scetbon; +Cc: linux-btrfs

On 12/12/2014 02:25 AM, Josef Bacik wrote:
> On 12/10/2014 03:09 AM, Dongsheng Yang wrote:
>> On 12/09/2014 11:42 PM, Josef Bacik wrote:
>>> On 12/09/2014 06:27 AM, Dongsheng Yang wrote:
>>>> When we exceed quota limit in writing, we will free
>>>> some reserved extent when we need to drop but not free
>>>> account in qgroup. It means, each time we exceed quota
>>>> in writing, there will be some remain space in qg->reserved
>>>> we can not use any more. If things go on like this, the
>>>> all space will be ate up.
>>>>
>>>> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
>>>> ---
>>>>   fs/btrfs/extent-tree.c | 5 ++++-
>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index a84e00d..014b7f2 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -5262,8 +5262,11 @@ out_fail:
>>>>               to_free = 0;
>>>>       }
>>>>       spin_unlock(&BTRFS_I(inode)->lock);
>>>> -    if (dropped)
>>>> +    if (dropped) {
>>>> +        if (root->fs_info->quota_enabled)
>>>> +            btrfs_qgroup_free(root, dropped * root->nodesize);
>>>
>>> This needs to be num_bytes + dropped * root->nodesize. Thanks,
>>
>> Let me try to explain why it did not free num_bytes here.
>>
>> In out_fail, we did not reserve num_bytes in qgroup successfully, then
>> we do not need
>> to free it in out_fail.
>>
>> The problem this patch attempts to solve is that, when we run into
>> out_fail here,
>> we will drop a outstanding extent. That said, in out_fail here, the
>> extra reserved
>> nodesize for some extents should be freed.
>>
>> Example:
>>      1). BTRFS_I(inode)->reserved_extents: 2,
>> BTRFS_I(inode)->outstanding_extents: 1.
>>            In this case, we go intobtrfs_delalloc_reserve_metadata().
>> outstanding_extents
>>            will be increased at first. then
>> BTRFS_I(inode)->outstanding_extents is 2.
>>            If we want to reserve space and failed. it will goto 
>> out_fail.
>>      2). In out_failed: reserved_extents is 2, outstanding_extents is 2.
>> we will get a dropped of 1
>>           from dropping_outstanding_extent(). And now,
>> reserved_extents:1, outstanding_extents:1.
>>
>> In step 2, we just decrease the reserved_extents without freeing the
>> related nodesize in qgroup
>> at the same time. So it will cause the problem I described in changelog
>> which will eat the space.
>>
>> Therefore, this patch here will free the nodesize related with the
>> dropped extents in step 2.
>> About the num_bytes, as we did not reserve it successfully, no need to
>> free it.
>>
>> With my poor english, there must be something confusing in my
>> description. Please correct me
>> if anything is wrong or not-good-explained.
>>
>
> So none of this made sense, I went back and read the code just now and 
> this was the part I was missing
>
> if (unlikely(ret)) {
>         if (root->fs_info->quota_enabled)
>                 btrfs_qgroup_free(root, num_bytes +
>                                         nr_extents * root->nodesize);
>         goto out_fail;
> }
>
> All you needed to say was we already free up what we reserved, we're 
> just dropping anything we needed to free because of 
> drop_outstanding_extents ;).  You can add
>
> Reviewed-by: Josef Bacik <jbacik@fb.com>

Thanx Josef. :)
>
> Thanks,
>
> Josef
> .
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] Btrfs: qgroup: free reserved in exceeding quota.
  2014-12-12  0:44         ` Dongsheng Yang
@ 2014-12-12  8:44           ` Dongsheng Yang
  2014-12-12  8:44             ` [PATCH v2 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-12  8:44 UTC (permalink / raw)
  To: jbacik; +Cc: linux-btrfs, Dongsheng Yang

When we exceed quota limit in writing, we will free
some reserved extent when we need to drop but not free
account in qgroup. It means, each time we exceed quota
in writing, there will be some remain space in qg->reserved
we can not use any more. If things go on like this, the
all space will be ate up.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/extent-tree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a84e00d..014b7f2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5262,8 +5262,11 @@ out_fail:
 			to_free = 0;
 	}
 	spin_unlock(&BTRFS_I(inode)->lock);
-	if (dropped)
+	if (dropped) {
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_free(root, dropped * root->nodesize);
 		to_free += btrfs_calc_trans_metadata_size(root, dropped);
+	}
 
 	if (to_free) {
 		btrfs_block_rsv_release(root, block_rsv, to_free);
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
  2014-12-12  8:44           ` [PATCH v2 " Dongsheng Yang
@ 2014-12-12  8:44             ` Dongsheng Yang
       [not found]               ` <549CBAB1.3070909@cn.fujitsu.com>
  0 siblings, 1 reply; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-12  8:44 UTC (permalink / raw)
  To: jbacik; +Cc: linux-btrfs, Dongsheng Yang

Currently, for pre_alloc or delay_alloc, the bytes will be accounted
in space_info by the three guys.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
But on the other hand, in qgroup, there are only two counters to account the
bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
bytes in space_info->bytes_may_use and qg->excl accounts bytes in
space_info->used. So the bytes in space_info->reserved is not accounted
in qgroup. If so, there is a window we can exceed the quota limit when
bytes is in space_info->reserved.

Example:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer     excl     max_rfer max_excl parent  child
-------- ----     ----     -------- -------- ------  -----
0/5      20987904 20987904 0        10485760 ---     ---

qg->excl is 20987904 larger than max_excl 10485760.

This patch introduce a new counter named may_use to qgroup, then
there are three counters in qgroup to account bytes in space_info
as below.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
qgroup->may_use           --- qgroup->reserved     --- qgroup->excl

With this patch applied:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer    excl    max_rfer max_excl parent  child
-------- ----    ----    -------- -------- ------  -----
0/5      9453568 9453568 0        10485760 ---     ---

Reported-by: Cyril SCETBON <cyril.scetbon@free.fr>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
---
Changelog:
	v1 -> v2:
		Remove the redundant check for fs_info->quota_enabled.

 fs/btrfs/extent-tree.c | 20 ++++++++++++++-
 fs/btrfs/inode.c       | 18 ++++++++++++-
 fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/qgroup.h      |  4 +++
 4 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 014b7f2..f4ad737 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root *root,
 
 	set_extent_dirty(root->fs_info->pinned_extents, bytenr,
 			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
-	if (reserved)
+	if (reserved) {
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   num_bytes, -1);
 		trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
+	}
 	return 0;
 }
 
@@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
 		trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
 		pin = 0;
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   buf->len, -1);
 	}
 out:
 	if (pin)
@@ -6964,7 +6971,11 @@ static int __btrfs_free_reserved_extent(struct btrfs_root *root,
 	else {
 		btrfs_add_free_space(cache, start, len);
 		btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   len, -1);
 	}
+
 	btrfs_put_block_group(cache);
 
 	trace_btrfs_reserved_extent_free(root, start, len);
@@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 	BUG_ON(ret); /* logic error */
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
 					 0, owner, offset, ins, 1);
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root->root_key.objectid,
+					   ins->offset, 1);
 	btrfs_put_block_group(block_group);
 	return ret;
 }
@@ -7346,6 +7360,10 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
 		return ERR_PTR(ret);
 	}
 
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root_objectid,
+					   ins.offset, 1);
+
 	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
 				    blocksize, level);
 	BUG_ON(IS_ERR(buf)); /* -ENOMEM */
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d23362f..0c824f3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -59,6 +59,7 @@
 #include "backref.h"
 #include "hash.h"
 #include "props.h"
+#include "qgroup.h"
 
 struct btrfs_iget_args {
 	struct btrfs_key *location;
@@ -739,7 +740,9 @@ retry:
 			}
 			goto out_free;
 		}
-
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
 		/*
 		 * here we're doing allocation and writeback of the
 		 * compressed pages
@@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode *inode,
 		if (ret < 0)
 			goto out_unlock;
 
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
+
 		em = alloc_extent_map();
 		if (!em) {
 			ret = -ENOMEM;
@@ -6745,6 +6752,10 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
 		return ERR_PTR(ret);
 	}
 
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root->root_key.objectid,
+					   ins.offset, 1);
+
 	return em;
 }
 
@@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 				btrfs_end_transaction(trans, root);
 			break;
 		}
+
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
+
 		btrfs_drop_extent_cache(inode, cur_offset,
 					cur_offset + ins.offset -1, 0);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48b60db..d147082 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -72,6 +72,7 @@ struct btrfs_qgroup {
 	/*
 	 * reservation tracking
 	 */
+	u64 may_use;
 	u64 reserved;
 
 	/*
@@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 	WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
 	qgroup->excl += sign * oper->num_bytes;
 	qgroup->excl_cmpr += sign * oper->num_bytes;
+	if (sign > 0)
+		qgroup->reserved -= oper->num_bytes;
 
 	qgroup_dirty(fs_info, qgroup);
 
@@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 		qgroup->excl += sign * oper->num_bytes;
 		if (sign < 0)
 			WARN_ON(qgroup->excl < oper->num_bytes);
+		if (sign > 0)
+			qgroup->reserved -= oper->num_bytes;
 		qgroup->excl_cmpr += sign * oper->num_bytes;
 		qgroup_dirty(fs_info, qgroup);
 
@@ -2359,6 +2364,61 @@ out:
 	return ret;
 }
 
+int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
+					    u64 ref_root,
+					    u64 num_bytes,
+					    int sign)
+{
+	struct btrfs_root *quota_root;
+	struct btrfs_qgroup *qgroup;
+	int ret = 0;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+
+	if (!is_fstree(ref_root) || !fs_info->quota_enabled)
+		return 0;
+
+	if (num_bytes == 0)
+		return 0;
+
+	spin_lock(&fs_info->qgroup_lock);
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		goto out;
+
+	qgroup = find_qgroup_rb(fs_info, ref_root);
+	if (!qgroup)
+		goto out;
+
+	ulist_reinit(fs_info->qgroup_ulist);
+	ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
+			(uintptr_t)qgroup, GFP_ATOMIC);
+	if (ret < 0)
+		goto out;
+
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
+		struct btrfs_qgroup *qg;
+		struct btrfs_qgroup_list *glist;
+
+		qg = u64_to_ptr(unode->aux);
+
+		qg->reserved += sign * num_bytes;
+
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ret = ulist_add(fs_info->qgroup_ulist,
+					glist->group->qgroupid,
+					(uintptr_t)glist->group, GFP_ATOMIC);
+			if (ret < 0)
+				goto out;
+		}
+	}
+
+out:
+	spin_unlock(&fs_info->qgroup_lock);
+	return ret;
+}
+
 /*
  * reserve some space for a qgroup and all its parents. The reservation takes
  * place with start_transaction or dealloc_reserve, similar to ENOSPC
@@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 		qg = u64_to_ptr(unode->aux);
 
 		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
-		    qg->reserved + (s64)qg->rfer + num_bytes >
+		    qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
 		    qg->max_rfer) {
 			ret = -EDQUOT;
 			goto out;
 		}
 
 		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
-		    qg->reserved + (s64)qg->excl + num_bytes >
+		    qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
 		    qg->max_excl) {
 			ret = -EDQUOT;
 			goto out;
@@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 
 		qg = u64_to_ptr(unode->aux);
 
-		qg->reserved += num_bytes;
+		qg->may_use += num_bytes;
 	}
 
 out:
@@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
 
 		qg = u64_to_ptr(unode->aux);
 
-		qg->reserved -= num_bytes;
+		qg->may_use -= num_bytes;
 
 		list_for_each_entry(glist, &qg->groups, next_group) {
 			ret = ulist_add(fs_info->qgroup_ulist,
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 18cc68c..31de88e 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
 			 struct btrfs_qgroup_inherit *inherit);
+int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
+				       u64 ref_root,
+				       u64 num_bytes,
+				       int sign);
 int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
 void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
 
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
       [not found]               ` <549CBAB1.3070909@cn.fujitsu.com>
@ 2014-12-26  5:43                 ` Satoru Takeuchi
  2014-12-26  6:49                   ` Dongsheng Yang
                                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Satoru Takeuchi @ 2014-12-26  5:43 UTC (permalink / raw)
  To: Dongsheng Yang; +Cc: jbacik, linux-btrfs

Hi Yang,

On 2014/12/26 10:32, Dongsheng Yang wrote:
> Hi Satoru,
>
> I saw your mail of "[BUG] "Quota Ignored On write" problem still exist with 3.16-rc5 <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>" in gmane.
> I guess this patch will fix the problem you mentioned.
> Could you help to give a try?

Although it reached disk quota before writing whole 1.5GB,
the copied size was only about 700 MB. In this case,
expected behavior is to reach disk quota just after
writing 1 GB.

* 3.19-rc1 without your two v2 patches

===============================================================================
+ mkfs.btrfs -f /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/vdb
         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
+ mount /dev/vdb /root/btrfs-auto-test/
+ ret=0
+ btrfs quota enable /root/btrfs-auto-test/
+ btrfs subvolume create /root/btrfs-auto-test//sub
Create subvolume '/root/btrfs-auto-test/sub'
+ btrfs qgroup limit 1G /root/btrfs-auto-test//sub
+ dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
1500000+0 records in
1500000+0 records out
1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
===============================================================================

* 3.19-rc1 with your two v2 patches

===============================================================================
+ mkfs.btrfs -f /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/vdb
         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
+ mount /dev/vdb /root/btrfs-auto-test/
+ ret=0
+ btrfs quota enable /root/btrfs-auto-test/
+ btrfs subvolume create /root/btrfs-auto-test//sub
Create subvolume '/root/btrfs-auto-test/sub'
+ btrfs qgroup limit 1G /root/btrfs-auto-test//sub
+ dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
681353+0 records in
681352+0 records out
697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
===============================================================================

Thanks,
Satoru

>
> Thanx
> Yang
>
> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>> in space_info by the three guys.
>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>> But on the other hand, in qgroup, there are only two counters to account the
>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>> space_info->used. So the bytes in space_info->reserved is not accounted
>> in qgroup. If so, there is a window we can exceed the quota limit when
>> bytes is in space_info->reserved.
>>
>> Example:
>> 	# btrfs quota enable /mnt
>> 	# btrfs qgroup limit -e 10M /mnt
>> 	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>> 	# sync
>> 	# btrfs qgroup show -pcre /mnt
>> qgroupid rfer     excl     max_rfer max_excl parent  child
>> -------- ----     ----     -------- -------- ------  -----
>> 0/5      20987904 20987904 0        10485760 ---     ---
>>
>> qg->excl is 20987904 larger than max_excl 10485760.
>>
>> This patch introduce a new counter named may_use to qgroup, then
>> there are three counters in qgroup to account bytes in space_info
>> as below.
>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>
>> With this patch applied:
>> 	# btrfs quota enable /mnt
>> 	# btrfs qgroup limit -e 10M /mnt
>> 	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>> 	# sync
>> 	# btrfs qgroup show -pcre /mnt
>> qgroupid rfer    excl    max_rfer max_excl parent  child
>> -------- ----    ----    -------- -------- ------  -----
>> 0/5      9453568 9453568 0        10485760 ---     ---
>>
>> Reported-by: Cyril SCETBON<cyril.scetbon@free.fr>
>> Signed-off-by: Dongsheng Yang<yangds.fnst@cn.fujitsu.com>
>> ---
>> Changelog:
>> 	v1 -> v2:
>> 		Remove the redundant check for fs_info->quota_enabled.
>>
>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>   fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/btrfs/qgroup.h      |  4 +++
>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 014b7f2..f4ad737 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root *root,
>>
>>   	set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>   			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>> -	if (reserved)
>> +	if (reserved) {
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   num_bytes, -1);
>>   		trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
>> +	}
>>   	return 0;
>>   }
>>
>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>>   		btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
>>   		trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
>>   		pin = 0;
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   buf->len, -1);
>>   	}
>>   out:
>>   	if (pin)
>> @@ -6964,7 +6971,11 @@ static int __btrfs_free_reserved_extent(struct btrfs_root *root,
>>   	else {
>>   		btrfs_add_free_space(cache, start, len);
>>   		btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   len, -1);
>>   	}
>> +
>>   	btrfs_put_block_group(cache);
>>
>>   	trace_btrfs_reserved_extent_free(root, start, len);
>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
>>   	BUG_ON(ret); /* logic error */
>>   	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>   					 0, owner, offset, ins, 1);
>> +	btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +					   root->root_key.objectid,
>> +					   ins->offset, 1);
>>   	btrfs_put_block_group(block_group);
>>   	return ret;
>>   }
>> @@ -7346,6 +7360,10 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>   		return ERR_PTR(ret);
>>   	}
>>
>> +	btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +					   root_objectid,
>> +					   ins.offset, 1);
>> +
>>   	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>   				    blocksize, level);
>>   	BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index d23362f..0c824f3 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -59,6 +59,7 @@
>>   #include "backref.h"
>>   #include "hash.h"
>>   #include "props.h"
>> +#include "qgroup.h"
>>
>>   struct btrfs_iget_args {
>>   	struct btrfs_key *location;
>> @@ -739,7 +740,9 @@ retry:
>>   			}
>>   			goto out_free;
>>   		}
>> -
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   ins.offset, 1);
>>   		/*
>>   		 * here we're doing allocation and writeback of the
>>   		 * compressed pages
>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode *inode,
>>   		if (ret < 0)
>>   			goto out_unlock;
>>
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   ins.offset, 1);
>> +
>>   		em = alloc_extent_map();
>>   		if (!em) {
>>   			ret = -ENOMEM;
>> @@ -6745,6 +6752,10 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
>>   		return ERR_PTR(ret);
>>   	}
>>
>> +	btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +					   root->root_key.objectid,
>> +					   ins.offset, 1);
>> +
>>   	return em;
>>   }
>>
>> @@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>>   				btrfs_end_transaction(trans, root);
>>   			break;
>>   		}
>> +
>> +		btrfs_qgroup_update_reserved_bytes(root->fs_info,
>> +						   root->root_key.objectid,
>> +						   ins.offset, 1);
>> +
>>   		btrfs_drop_extent_cache(inode, cur_offset,
>>   					cur_offset + ins.offset -1, 0);
>>
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index 48b60db..d147082 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>   	/*
>>   	 * reservation tracking
>>   	 */
>> +	u64 may_use;
>>   	u64 reserved;
>>
>>   	/*
>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
>>   	WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>   	qgroup->excl += sign * oper->num_bytes;
>>   	qgroup->excl_cmpr += sign * oper->num_bytes;
>> +	if (sign > 0)
>> +		qgroup->reserved -= oper->num_bytes;
>>
>>   	qgroup_dirty(fs_info, qgroup);
>>
>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
>>   		qgroup->excl += sign * oper->num_bytes;
>>   		if (sign < 0)
>>   			WARN_ON(qgroup->excl < oper->num_bytes);
>> +		if (sign > 0)
>> +			qgroup->reserved -= oper->num_bytes;
>>   		qgroup->excl_cmpr += sign * oper->num_bytes;
>>   		qgroup_dirty(fs_info, qgroup);
>>
>> @@ -2359,6 +2364,61 @@ out:
>>   	return ret;
>>   }
>>
>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>> +					    u64 ref_root,
>> +					    u64 num_bytes,
>> +					    int sign)
>> +{
>> +	struct btrfs_root *quota_root;
>> +	struct btrfs_qgroup *qgroup;
>> +	int ret = 0;
>> +	struct ulist_node *unode;
>> +	struct ulist_iterator uiter;
>> +
>> +	if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>> +		return 0;
>> +
>> +	if (num_bytes == 0)
>> +		return 0;
>> +
>> +	spin_lock(&fs_info->qgroup_lock);
>> +	quota_root = fs_info->quota_root;
>> +	if (!quota_root)
>> +		goto out;
>> +
>> +	qgroup = find_qgroup_rb(fs_info, ref_root);
>> +	if (!qgroup)
>> +		goto out;
>> +
>> +	ulist_reinit(fs_info->qgroup_ulist);
>> +	ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>> +			(uintptr_t)qgroup, GFP_ATOMIC);
>> +	if (ret < 0)
>> +		goto out;
>> +
>> +	ULIST_ITER_INIT(&uiter);
>> +	while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>> +		struct btrfs_qgroup *qg;
>> +		struct btrfs_qgroup_list *glist;
>> +
>> +		qg = u64_to_ptr(unode->aux);
>> +
>> +		qg->reserved += sign * num_bytes;
>> +
>> +		list_for_each_entry(glist, &qg->groups, next_group) {
>> +			ret = ulist_add(fs_info->qgroup_ulist,
>> +					glist->group->qgroupid,
>> +					(uintptr_t)glist->group, GFP_ATOMIC);
>> +			if (ret < 0)
>> +				goto out;
>> +		}
>> +	}
>> +
>> +out:
>> +	spin_unlock(&fs_info->qgroup_lock);
>> +	return ret;
>> +}
>> +
>>   /*
>>    * reserve some space for a qgroup and all its parents. The reservation takes
>>    * place with start_transaction or dealloc_reserve, similar to ENOSPC
>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
>>   		qg = u64_to_ptr(unode->aux);
>>
>>   		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>> -		    qg->reserved + (s64)qg->rfer + num_bytes >
>> +		    qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
>>   		    qg->max_rfer) {
>>   			ret = -EDQUOT;
>>   			goto out;
>>   		}
>>
>>   		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>> -		    qg->reserved + (s64)qg->excl + num_bytes >
>> +		    qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
>>   		    qg->max_excl) {
>>   			ret = -EDQUOT;
>>   			goto out;
>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
>>
>>   		qg = u64_to_ptr(unode->aux);
>>
>> -		qg->reserved += num_bytes;
>> +		qg->may_use += num_bytes;
>>   	}
>>
>>   out:
>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
>>
>>   		qg = u64_to_ptr(unode->aux);
>>
>> -		qg->reserved -= num_bytes;
>> +		qg->may_use -= num_bytes;
>>
>>   		list_for_each_entry(glist, &qg->groups, next_group) {
>>   			ret = ulist_add(fs_info->qgroup_ulist,
>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>> index 18cc68c..31de88e 100644
>> --- a/fs/btrfs/qgroup.h
>> +++ b/fs/btrfs/qgroup.h
>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>   			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
>>   			 struct btrfs_qgroup_inherit *inherit);
>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>> +				       u64 ref_root,
>> +				       u64 num_bytes,
>> +				       int sign);
>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
  2014-12-26  5:43                 ` Satoru Takeuchi
@ 2014-12-26  6:49                   ` Dongsheng Yang
  2014-12-26  7:00                     ` Dongsheng Yang
  2014-12-28  2:03                   ` Dongsheng Yang
  2015-01-05  6:16                   ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Dongsheng Yang
  2 siblings, 1 reply; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-26  6:49 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: jbacik, linux-btrfs

On 12/26/2014 01:43 PM, Satoru Takeuchi wrote:
> Hi Yang,
>
> On 2014/12/26 10:32, Dongsheng Yang wrote:
>> Hi Satoru,
>>
>> I saw your mail of "[BUG] "Quota Ignored On write" problem still 
>> exist with 3.16-rc5 
>> <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>" 
>> in gmane.
>> I guess this patch will fix the problem you mentioned.
>> Could you help to give a try?
>
> Although it reached disk quota before writing whole 1.5GB,
> the copied size was only about 700 MB. In this case,
> expected behavior is to reach disk quota just after
> writing 1 GB.

Hi Satoru,
Thanx for your testing a lot. But stranger, I tried it in 3.19-rc1 and 
3.18, both works well
writing 1GB. Could you give me some more information about your testing 
machine?
And could you execute "sync && btrfs qgroup show -pecr $MNT" after writing?

*3.19-rc1 with my 2/2 patch applied.
=======================================================================
[root@atest-guest linux_btrfs]# uname -a
Linux atest-guest 3.19.0-rc1+ #66 SMP Fri Dec 26 10:38:12 EST 2014 
x86_64 x86_64 x86_64 GNU/Linux
[root@atest-guest linux_btrfs]# sh quota.sh
umount: /mnt: not mounted
Btrfs v3.17-1-gbef9fd4
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file 
to 65536
fs created label (null) on /dev/vdb
     nodesize 16384 leafsize 16384 sectorsize 4096 size 50.00GiB
Create subvolume '/mnt/quota_test'
dd: error writing \u2018/mnt/quota_test/test\u2019: Disk quota exceeded
999842+0 records in
999841+0 records out
1023837184 bytes (1.0 GB) copied, 8.06996 s, 127 MB/s
[PASS] quota works correctly
[root@atest-guest linux_btrfs]# sync
[root@atest-guest linux_btrfs]# df -h /mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb         50G  996M   49G   2% /mnt
[root@atest-guest linux_btrfs]# btrfs qgroup show -pecr /mnt/
qgroupid rfer       excl       max_rfer   max_excl parent  child
-------- ----       ----       --------   -------- ------  -----
0/5      16384      16384      0          0        ---     ---
0/256    1023868928 1023868928 1024000000 0        ---     ---
=============================================================
>
> * 3.19-rc1 without your two v2 patches
>
> =============================================================================== 
>
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per 
> file to 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 
> count=1500000
> 1500000+0 records in
> 1500000+0 records out
> 1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
> =============================================================================== 
>
>
> * 3.19-rc1 with your two v2 patches
>
> =============================================================================== 
>
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per 
> file to 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 
> count=1500000
> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
> 681353+0 records in
> 681352+0 records out
> 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
> =============================================================================== 
>
>
> Thanks,
> Satoru
>
>>
>> Thanx
>> Yang
>>
>> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>>> in space_info by the three guys.
>>> space_info->bytes_may_use --- space_info->reserved --- 
>>> space_info->used.
>>> But on the other hand, in qgroup, there are only two counters to 
>>> account the
>>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>>> space_info->used. So the bytes in space_info->reserved is not accounted
>>> in qgroup. If so, there is a window we can exceed the quota limit when
>>> bytes is in space_info->reserved.
>>>
>>> Example:
>>>     # btrfs quota enable /mnt
>>>     # btrfs qgroup limit -e 10M /mnt
>>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>     # sync
>>>     # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer     excl     max_rfer max_excl parent  child
>>> -------- ----     ----     -------- -------- ------  -----
>>> 0/5      20987904 20987904 0        10485760 ---     ---
>>>
>>> qg->excl is 20987904 larger than max_excl 10485760.
>>>
>>> This patch introduce a new counter named may_use to qgroup, then
>>> there are three counters in qgroup to account bytes in space_info
>>> as below.
>>> space_info->bytes_may_use --- space_info->reserved --- 
>>> space_info->used.
>>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>>
>>> With this patch applied:
>>>     # btrfs quota enable /mnt
>>>     # btrfs qgroup limit -e 10M /mnt
>>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>>>     # sync
>>>     # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer    excl    max_rfer max_excl parent  child
>>> -------- ----    ----    -------- -------- ------  -----
>>> 0/5      9453568 9453568 0        10485760 ---     ---
>>>
>>> Reported-by: Cyril SCETBON<cyril.scetbon@free.fr>
>>> Signed-off-by: Dongsheng Yang<yangds.fnst@cn.fujitsu.com>
>>> ---
>>> Changelog:
>>>     v1 -> v2:
>>>         Remove the redundant check for fs_info->quota_enabled.
>>>
>>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>>   fs/btrfs/qgroup.c      | 68 
>>> +++++++++++++++++++++++++++++++++++++++++++++++---
>>>   fs/btrfs/qgroup.h      |  4 +++
>>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 014b7f2..f4ad737 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root 
>>> *root,
>>>
>>>       set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>>                bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>>> -    if (reserved)
>>> +    if (reserved) {
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           num_bytes, -1);
>>>           trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
>>> +    }
>>>       return 0;
>>>   }
>>>
>>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct 
>>> btrfs_trans_handle *trans,
>>>           btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 
>>> 0);
>>>           trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
>>>           pin = 0;
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           buf->len, -1);
>>>       }
>>>   out:
>>>       if (pin)
>>> @@ -6964,7 +6971,11 @@ static int 
>>> __btrfs_free_reserved_extent(struct btrfs_root *root,
>>>       else {
>>>           btrfs_add_free_space(cache, start, len);
>>>           btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, 
>>> delalloc);
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           len, -1);
>>>       }
>>> +
>>>       btrfs_put_block_group(cache);
>>>
>>>       trace_btrfs_reserved_extent_free(root, start, len);
>>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct 
>>> btrfs_trans_handle *trans,
>>>       BUG_ON(ret); /* logic error */
>>>       ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>>                        0, owner, offset, ins, 1);
>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                       root->root_key.objectid,
>>> +                       ins->offset, 1);
>>>       btrfs_put_block_group(block_group);
>>>       return ret;
>>>   }
>>> @@ -7346,6 +7360,10 @@ struct extent_buffer 
>>> *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>>           return ERR_PTR(ret);
>>>       }
>>>
>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                       root_objectid,
>>> +                       ins.offset, 1);
>>> +
>>>       buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>>                       blocksize, level);
>>>       BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index d23362f..0c824f3 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -59,6 +59,7 @@
>>>   #include "backref.h"
>>>   #include "hash.h"
>>>   #include "props.h"
>>> +#include "qgroup.h"
>>>
>>>   struct btrfs_iget_args {
>>>       struct btrfs_key *location;
>>> @@ -739,7 +740,9 @@ retry:
>>>               }
>>>               goto out_free;
>>>           }
>>> -
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           ins.offset, 1);
>>>           /*
>>>            * here we're doing allocation and writeback of the
>>>            * compressed pages
>>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode 
>>> *inode,
>>>           if (ret < 0)
>>>               goto out_unlock;
>>>
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           ins.offset, 1);
>>> +
>>>           em = alloc_extent_map();
>>>           if (!em) {
>>>               ret = -ENOMEM;
>>> @@ -6745,6 +6752,10 @@ static struct extent_map 
>>> *btrfs_new_extent_direct(struct inode *inode,
>>>           return ERR_PTR(ret);
>>>       }
>>>
>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                       root->root_key.objectid,
>>> +                       ins.offset, 1);
>>> +
>>>       return em;
>>>   }
>>>
>>> @@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct 
>>> inode *inode, int mode,
>>>                   btrfs_end_transaction(trans, root);
>>>               break;
>>>           }
>>> +
>>> +        btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                           root->root_key.objectid,
>>> +                           ins.offset, 1);
>>> +
>>>           btrfs_drop_extent_cache(inode, cur_offset,
>>>                       cur_offset + ins.offset -1, 0);
>>>
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index 48b60db..d147082 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>>       /*
>>>        * reservation tracking
>>>        */
>>> +    u64 may_use;
>>>       u64 reserved;
>>>
>>>       /*
>>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct 
>>> btrfs_fs_info *fs_info,
>>>       WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>>       qgroup->excl += sign * oper->num_bytes;
>>>       qgroup->excl_cmpr += sign * oper->num_bytes;
>>> +    if (sign > 0)
>>> +        qgroup->reserved -= oper->num_bytes;
>>>
>>>       qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct 
>>> btrfs_fs_info *fs_info,
>>>           qgroup->excl += sign * oper->num_bytes;
>>>           if (sign < 0)
>>>               WARN_ON(qgroup->excl < oper->num_bytes);
>>> +        if (sign > 0)
>>> +            qgroup->reserved -= oper->num_bytes;
>>>           qgroup->excl_cmpr += sign * oper->num_bytes;
>>>           qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -2359,6 +2364,61 @@ out:
>>>       return ret;
>>>   }
>>>
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                        u64 ref_root,
>>> +                        u64 num_bytes,
>>> +                        int sign)
>>> +{
>>> +    struct btrfs_root *quota_root;
>>> +    struct btrfs_qgroup *qgroup;
>>> +    int ret = 0;
>>> +    struct ulist_node *unode;
>>> +    struct ulist_iterator uiter;
>>> +
>>> +    if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>>> +        return 0;
>>> +
>>> +    if (num_bytes == 0)
>>> +        return 0;
>>> +
>>> +    spin_lock(&fs_info->qgroup_lock);
>>> +    quota_root = fs_info->quota_root;
>>> +    if (!quota_root)
>>> +        goto out;
>>> +
>>> +    qgroup = find_qgroup_rb(fs_info, ref_root);
>>> +    if (!qgroup)
>>> +        goto out;
>>> +
>>> +    ulist_reinit(fs_info->qgroup_ulist);
>>> +    ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>>> +            (uintptr_t)qgroup, GFP_ATOMIC);
>>> +    if (ret < 0)
>>> +        goto out;
>>> +
>>> +    ULIST_ITER_INIT(&uiter);
>>> +    while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>>> +        struct btrfs_qgroup *qg;
>>> +        struct btrfs_qgroup_list *glist;
>>> +
>>> +        qg = u64_to_ptr(unode->aux);
>>> +
>>> +        qg->reserved += sign * num_bytes;
>>> +
>>> +        list_for_each_entry(glist, &qg->groups, next_group) {
>>> +            ret = ulist_add(fs_info->qgroup_ulist,
>>> +                    glist->group->qgroupid,
>>> +                    (uintptr_t)glist->group, GFP_ATOMIC);
>>> +            if (ret < 0)
>>> +                goto out;
>>> +        }
>>> +    }
>>> +
>>> +out:
>>> +    spin_unlock(&fs_info->qgroup_lock);
>>> +    return ret;
>>> +}
>>> +
>>>   /*
>>>    * reserve some space for a qgroup and all its parents. The 
>>> reservation takes
>>>    * place with start_transaction or dealloc_reserve, similar to ENOSPC
>>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root 
>>> *root, u64 num_bytes)
>>>           qg = u64_to_ptr(unode->aux);
>>>
>>>           if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>> -            qg->reserved + (s64)qg->rfer + num_bytes >
>>> +            qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
>>>               qg->max_rfer) {
>>>               ret = -EDQUOT;
>>>               goto out;
>>>           }
>>>
>>>           if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>> -            qg->reserved + (s64)qg->excl + num_bytes >
>>> +            qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
>>>               qg->max_excl) {
>>>               ret = -EDQUOT;
>>>               goto out;
>>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root 
>>> *root, u64 num_bytes)
>>>
>>>           qg = u64_to_ptr(unode->aux);
>>>
>>> -        qg->reserved += num_bytes;
>>> +        qg->may_use += num_bytes;
>>>       }
>>>
>>>   out:
>>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root 
>>> *root, u64 num_bytes)
>>>
>>>           qg = u64_to_ptr(unode->aux);
>>>
>>> -        qg->reserved -= num_bytes;
>>> +        qg->may_use -= num_bytes;
>>>
>>>           list_for_each_entry(glist, &qg->groups, next_group) {
>>>               ret = ulist_add(fs_info->qgroup_ulist,
>>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>>> index 18cc68c..31de88e 100644
>>> --- a/fs/btrfs/qgroup.h
>>> +++ b/fs/btrfs/qgroup.h
>>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle 
>>> *trans,
>>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>>                struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
>>>                struct btrfs_qgroup_inherit *inherit);
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                       u64 ref_root,
>>> +                       u64 num_bytes,
>>> +                       int sign);
>>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>>
>>
>
> .
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
  2014-12-26  6:49                   ` Dongsheng Yang
@ 2014-12-26  7:00                     ` Dongsheng Yang
  0 siblings, 0 replies; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-26  7:00 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: jbacik, linux-btrfs

On 12/26/2014 02:49 PM, Dongsheng Yang wrote:
> On 12/26/2014 01:43 PM, Satoru Takeuchi wrote:
>> Hi Yang,
>>
>> On 2014/12/26 10:32, Dongsheng Yang wrote:
>>> Hi Satoru,
>>>
>>> I saw your mail of "[BUG] "Quota Ignored On write" problem still 
>>> exist with 3.16-rc5 
>>> <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>" 
>>> in gmane.
>>> I guess this patch will fix the problem you mentioned.
>>> Could you help to give a try?
>>
>> Although it reached disk quota before writing whole 1.5GB,
>> the copied size was only about 700 MB. In this case,
>> expected behavior is to reach disk quota just after
>> writing 1 GB.
>
> Hi Satoru,
> Thanx for your testing a lot. But stranger 

Typo, "strange". :)
> I tried it in 3.19-rc1 and 3.18, both works well
> writing 1GB. Could you give me some more information about your 
> testing machine?
> And could you execute "sync && btrfs qgroup show -pecr $MNT" after 
> writing?
>
> *3.19-rc1 with my 2/2 patch applied.

I also tried with the two patches applied, and got the same result.

Thanx
> =======================================================================
> [root@atest-guest linux_btrfs]# uname -a
> Linux atest-guest 3.19.0-rc1+ #66 SMP Fri Dec 26 10:38:12 EST 2014 
> x86_64 x86_64 x86_64 GNU/Linux
> [root@atest-guest linux_btrfs]# sh quota.sh
> umount: /mnt: not mounted
> Btrfs v3.17-1-gbef9fd4
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per 
> file to 65536
> fs created label (null) on /dev/vdb
>     nodesize 16384 leafsize 16384 sectorsize 4096 size 50.00GiB
> Create subvolume '/mnt/quota_test'
> dd: error writing \u2018/mnt/quota_test/test\u2019: Disk quota exceeded
> 999842+0 records in
> 999841+0 records out
> 1023837184 bytes (1.0 GB) copied, 8.06996 s, 127 MB/s
> [PASS] quota works correctly
> [root@atest-guest linux_btrfs]# sync
> [root@atest-guest linux_btrfs]# df -h /mnt/
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/vdb         50G  996M   49G   2% /mnt
> [root@atest-guest linux_btrfs]# btrfs qgroup show -pecr /mnt/
> qgroupid rfer       excl       max_rfer   max_excl parent  child
> -------- ----       ----       --------   -------- ------  -----
> 0/5      16384      16384      0          0        ---     ---
> 0/256    1023868928 1023868928 1024000000 0        ---     ---
> =============================================================
>>
>> * 3.19-rc1 without your two v2 patches
>>
>> =============================================================================== 
>>
>> + mkfs.btrfs -f /dev/vdb
>> Btrfs v3.17
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Turning ON incompat feature 'extref': increased hardlink limit per 
>> file to 65536
>> fs created label (null) on /dev/vdb
>>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
>> + mount /dev/vdb /root/btrfs-auto-test/
>> + ret=0
>> + btrfs quota enable /root/btrfs-auto-test/
>> + btrfs subvolume create /root/btrfs-auto-test//sub
>> Create subvolume '/root/btrfs-auto-test/sub'
>> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
>> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 
>> count=1500000
>> 1500000+0 records in
>> 1500000+0 records out
>> 1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
>> =============================================================================== 
>>
>>
>> * 3.19-rc1 with your two v2 patches
>>
>> =============================================================================== 
>>
>> + mkfs.btrfs -f /dev/vdb
>> Btrfs v3.17
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Turning ON incompat feature 'extref': increased hardlink limit per 
>> file to 65536
>> fs created label (null) on /dev/vdb
>>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
>> + mount /dev/vdb /root/btrfs-auto-test/
>> + ret=0
>> + btrfs quota enable /root/btrfs-auto-test/
>> + btrfs subvolume create /root/btrfs-auto-test//sub
>> Create subvolume '/root/btrfs-auto-test/sub'
>> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
>> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 
>> count=1500000
>> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
>> 681353+0 records in
>> 681352+0 records out
>> 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
>> =============================================================================== 
>>
>>
>> Thanks,
>> Satoru
>>
>>>
>>> Thanx
>>> Yang
>>>
>>> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>>>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>>>> in space_info by the three guys.
>>>> space_info->bytes_may_use --- space_info->reserved --- 
>>>> space_info->used.
>>>> But on the other hand, in qgroup, there are only two counters to 
>>>> account the
>>>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>>>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>>>> space_info->used. So the bytes in space_info->reserved is not 
>>>> accounted
>>>> in qgroup. If so, there is a window we can exceed the quota limit when
>>>> bytes is in space_info->reserved.
>>>>
>>>> Example:
>>>>     # btrfs quota enable /mnt
>>>>     # btrfs qgroup limit -e 10M /mnt
>>>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>>     # sync
>>>>     # btrfs qgroup show -pcre /mnt
>>>> qgroupid rfer     excl     max_rfer max_excl parent  child
>>>> -------- ----     ----     -------- -------- ------  -----
>>>> 0/5      20987904 20987904 0        10485760 ---     ---
>>>>
>>>> qg->excl is 20987904 larger than max_excl 10485760.
>>>>
>>>> This patch introduce a new counter named may_use to qgroup, then
>>>> there are three counters in qgroup to account bytes in space_info
>>>> as below.
>>>> space_info->bytes_may_use --- space_info->reserved --- 
>>>> space_info->used.
>>>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>>>
>>>> With this patch applied:
>>>>     # btrfs quota enable /mnt
>>>>     # btrfs qgroup limit -e 10M /mnt
>>>>     # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>>>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>>>>     # sync
>>>>     # btrfs qgroup show -pcre /mnt
>>>> qgroupid rfer    excl    max_rfer max_excl parent  child
>>>> -------- ----    ----    -------- -------- ------  -----
>>>> 0/5      9453568 9453568 0        10485760 ---     ---
>>>>
>>>> Reported-by: Cyril SCETBON<cyril.scetbon@free.fr>
>>>> Signed-off-by: Dongsheng Yang<yangds.fnst@cn.fujitsu.com>
>>>> ---
>>>> Changelog:
>>>>     v1 -> v2:
>>>>         Remove the redundant check for fs_info->quota_enabled.
>>>>
>>>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>>>   fs/btrfs/qgroup.c      | 68 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++---
>>>>   fs/btrfs/qgroup.h      |  4 +++
>>>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 014b7f2..f4ad737 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root 
>>>> *root,
>>>>
>>>>       set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>>>                bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>>>> -    if (reserved)
>>>> +    if (reserved) {
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           num_bytes, -1);
>>>>           trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
>>>> +    }
>>>>       return 0;
>>>>   }
>>>>
>>>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct 
>>>> btrfs_trans_handle *trans,
>>>>           btrfs_update_reserved_bytes(cache, buf->len, 
>>>> RESERVE_FREE, 0);
>>>>           trace_btrfs_reserved_extent_free(root, buf->start, 
>>>> buf->len);
>>>>           pin = 0;
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           buf->len, -1);
>>>>       }
>>>>   out:
>>>>       if (pin)
>>>> @@ -6964,7 +6971,11 @@ static int 
>>>> __btrfs_free_reserved_extent(struct btrfs_root *root,
>>>>       else {
>>>>           btrfs_add_free_space(cache, start, len);
>>>>           btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, 
>>>> delalloc);
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           len, -1);
>>>>       }
>>>> +
>>>>       btrfs_put_block_group(cache);
>>>>
>>>>       trace_btrfs_reserved_extent_free(root, start, len);
>>>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct 
>>>> btrfs_trans_handle *trans,
>>>>       BUG_ON(ret); /* logic error */
>>>>       ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>>>                        0, owner, offset, ins, 1);
>>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                       root->root_key.objectid,
>>>> +                       ins->offset, 1);
>>>>       btrfs_put_block_group(block_group);
>>>>       return ret;
>>>>   }
>>>> @@ -7346,6 +7360,10 @@ struct extent_buffer 
>>>> *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>>>           return ERR_PTR(ret);
>>>>       }
>>>>
>>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                       root_objectid,
>>>> +                       ins.offset, 1);
>>>> +
>>>>       buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>>>                       blocksize, level);
>>>>       BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>> index d23362f..0c824f3 100644
>>>> --- a/fs/btrfs/inode.c
>>>> +++ b/fs/btrfs/inode.c
>>>> @@ -59,6 +59,7 @@
>>>>   #include "backref.h"
>>>>   #include "hash.h"
>>>>   #include "props.h"
>>>> +#include "qgroup.h"
>>>>
>>>>   struct btrfs_iget_args {
>>>>       struct btrfs_key *location;
>>>> @@ -739,7 +740,9 @@ retry:
>>>>               }
>>>>               goto out_free;
>>>>           }
>>>> -
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           ins.offset, 1);
>>>>           /*
>>>>            * here we're doing allocation and writeback of the
>>>>            * compressed pages
>>>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct 
>>>> inode *inode,
>>>>           if (ret < 0)
>>>>               goto out_unlock;
>>>>
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           ins.offset, 1);
>>>> +
>>>>           em = alloc_extent_map();
>>>>           if (!em) {
>>>>               ret = -ENOMEM;
>>>> @@ -6745,6 +6752,10 @@ static struct extent_map 
>>>> *btrfs_new_extent_direct(struct inode *inode,
>>>>           return ERR_PTR(ret);
>>>>       }
>>>>
>>>> +    btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                       root->root_key.objectid,
>>>> +                       ins.offset, 1);
>>>> +
>>>>       return em;
>>>>   }
>>>>
>>>> @@ -9266,6 +9277,11 @@ static int 
>>>> __btrfs_prealloc_file_range(struct inode *inode, int mode,
>>>>                   btrfs_end_transaction(trans, root);
>>>>               break;
>>>>           }
>>>> +
>>>> + btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>>> +                           root->root_key.objectid,
>>>> +                           ins.offset, 1);
>>>> +
>>>>           btrfs_drop_extent_cache(inode, cur_offset,
>>>>                       cur_offset + ins.offset -1, 0);
>>>>
>>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>>> index 48b60db..d147082 100644
>>>> --- a/fs/btrfs/qgroup.c
>>>> +++ b/fs/btrfs/qgroup.c
>>>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>>>       /*
>>>>        * reservation tracking
>>>>        */
>>>> +    u64 may_use;
>>>>       u64 reserved;
>>>>
>>>>       /*
>>>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct 
>>>> btrfs_fs_info *fs_info,
>>>>       WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>>>       qgroup->excl += sign * oper->num_bytes;
>>>>       qgroup->excl_cmpr += sign * oper->num_bytes;
>>>> +    if (sign > 0)
>>>> +        qgroup->reserved -= oper->num_bytes;
>>>>
>>>>       qgroup_dirty(fs_info, qgroup);
>>>>
>>>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct 
>>>> btrfs_fs_info *fs_info,
>>>>           qgroup->excl += sign * oper->num_bytes;
>>>>           if (sign < 0)
>>>>               WARN_ON(qgroup->excl < oper->num_bytes);
>>>> +        if (sign > 0)
>>>> +            qgroup->reserved -= oper->num_bytes;
>>>>           qgroup->excl_cmpr += sign * oper->num_bytes;
>>>>           qgroup_dirty(fs_info, qgroup);
>>>>
>>>> @@ -2359,6 +2364,61 @@ out:
>>>>       return ret;
>>>>   }
>>>>
>>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>>> +                        u64 ref_root,
>>>> +                        u64 num_bytes,
>>>> +                        int sign)
>>>> +{
>>>> +    struct btrfs_root *quota_root;
>>>> +    struct btrfs_qgroup *qgroup;
>>>> +    int ret = 0;
>>>> +    struct ulist_node *unode;
>>>> +    struct ulist_iterator uiter;
>>>> +
>>>> +    if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>>>> +        return 0;
>>>> +
>>>> +    if (num_bytes == 0)
>>>> +        return 0;
>>>> +
>>>> +    spin_lock(&fs_info->qgroup_lock);
>>>> +    quota_root = fs_info->quota_root;
>>>> +    if (!quota_root)
>>>> +        goto out;
>>>> +
>>>> +    qgroup = find_qgroup_rb(fs_info, ref_root);
>>>> +    if (!qgroup)
>>>> +        goto out;
>>>> +
>>>> +    ulist_reinit(fs_info->qgroup_ulist);
>>>> +    ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>>>> +            (uintptr_t)qgroup, GFP_ATOMIC);
>>>> +    if (ret < 0)
>>>> +        goto out;
>>>> +
>>>> +    ULIST_ITER_INIT(&uiter);
>>>> +    while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>>>> +        struct btrfs_qgroup *qg;
>>>> +        struct btrfs_qgroup_list *glist;
>>>> +
>>>> +        qg = u64_to_ptr(unode->aux);
>>>> +
>>>> +        qg->reserved += sign * num_bytes;
>>>> +
>>>> +        list_for_each_entry(glist, &qg->groups, next_group) {
>>>> +            ret = ulist_add(fs_info->qgroup_ulist,
>>>> +                    glist->group->qgroupid,
>>>> +                    (uintptr_t)glist->group, GFP_ATOMIC);
>>>> +            if (ret < 0)
>>>> +                goto out;
>>>> +        }
>>>> +    }
>>>> +
>>>> +out:
>>>> +    spin_unlock(&fs_info->qgroup_lock);
>>>> +    return ret;
>>>> +}
>>>> +
>>>>   /*
>>>>    * reserve some space for a qgroup and all its parents. The 
>>>> reservation takes
>>>>    * place with start_transaction or dealloc_reserve, similar to 
>>>> ENOSPC
>>>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root 
>>>> *root, u64 num_bytes)
>>>>           qg = u64_to_ptr(unode->aux);
>>>>
>>>>           if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>>> -            qg->reserved + (s64)qg->rfer + num_bytes >
>>>> +            qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
>>>>               qg->max_rfer) {
>>>>               ret = -EDQUOT;
>>>>               goto out;
>>>>           }
>>>>
>>>>           if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>>> -            qg->reserved + (s64)qg->excl + num_bytes >
>>>> +            qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
>>>>               qg->max_excl) {
>>>>               ret = -EDQUOT;
>>>>               goto out;
>>>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root 
>>>> *root, u64 num_bytes)
>>>>
>>>>           qg = u64_to_ptr(unode->aux);
>>>>
>>>> -        qg->reserved += num_bytes;
>>>> +        qg->may_use += num_bytes;
>>>>       }
>>>>
>>>>   out:
>>>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root 
>>>> *root, u64 num_bytes)
>>>>
>>>>           qg = u64_to_ptr(unode->aux);
>>>>
>>>> -        qg->reserved -= num_bytes;
>>>> +        qg->may_use -= num_bytes;
>>>>
>>>>           list_for_each_entry(glist, &qg->groups, next_group) {
>>>>               ret = ulist_add(fs_info->qgroup_ulist,
>>>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>>>> index 18cc68c..31de88e 100644
>>>> --- a/fs/btrfs/qgroup.h
>>>> +++ b/fs/btrfs/qgroup.h
>>>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle 
>>>> *trans,
>>>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>>>                struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
>>>>                struct btrfs_qgroup_inherit *inherit);
>>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>>> +                       u64 ref_root,
>>>> +                       u64 num_bytes,
>>>> +                       int sign);
>>>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>>>
>>>
>>
>> .
>>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
  2014-12-26  5:43                 ` Satoru Takeuchi
  2014-12-26  6:49                   ` Dongsheng Yang
@ 2014-12-28  2:03                   ` Dongsheng Yang
  2015-01-05  6:16                   ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Dongsheng Yang
  2 siblings, 0 replies; 20+ messages in thread
From: Dongsheng Yang @ 2014-12-28  2:03 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: Dongsheng Yang, jbacik, linux-btrfs

On Fri, Dec 26, 2014 at 1:43 PM, Satoru Takeuchi
<takeuchi_satoru@jp.fujitsu.com> wrote:
> Hi Yang,
>
> On 2014/12/26 10:32, Dongsheng Yang wrote:
>>
>> Hi Satoru,
>>
>> I saw your mail of "[BUG] "Quota Ignored On write" problem still exist
>> with 3.16-rc5
>> <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>"
>> in gmane.
>> I guess this patch will fix the problem you mentioned.
>> Could you help to give a try?
>
>
> Although it reached disk quota before writing whole 1.5GB,
> the copied size was only about 700 MB. In this case,
> expected behavior is to reach disk quota just after
> writing 1 GB.

Hi Satoru-san,

After some more investigation, I did meet this problem you described here.
There is a window between "reserved += num_bytes" and "may_use -= num_bytes".
And the similar problem exists in real space accounting. I am cooking a patch
to solve this problem.

Thanx for your test a lot.

Yang
>
> * 3.19-rc1 without your two v2 patches
>
> ===============================================================================
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to
> 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
> 1500000+0 records in
> 1500000+0 records out
> 1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
> ===============================================================================
>
> * 3.19-rc1 with your two v2 patches
>
> ===============================================================================
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to
> 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
> 681353+0 records in
> 681352+0 records out
> 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
> ===============================================================================
>
> Thanks,
> Satoru
>
>>
>> Thanx
>> Yang
>>
>>
>> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>>>
>>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>>> in space_info by the three guys.
>>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>>> But on the other hand, in qgroup, there are only two counters to account
>>> the
>>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>>> space_info->used. So the bytes in space_info->reserved is not accounted
>>> in qgroup. If so, there is a window we can exceed the quota limit when
>>> bytes is in space_info->reserved.
>>>
>>> Example:
>>>         # btrfs quota enable /mnt
>>>         # btrfs qgroup limit -e 10M /mnt
>>>         # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>         # sync
>>>         # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer     excl     max_rfer max_excl parent  child
>>> -------- ----     ----     -------- -------- ------  -----
>>> 0/5      20987904 20987904 0        10485760 ---     ---
>>>
>>> qg->excl is 20987904 larger than max_excl 10485760.
>>>
>>> This patch introduce a new counter named may_use to qgroup, then
>>> there are three counters in qgroup to account bytes in space_info
>>> as below.
>>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>>
>>> With this patch applied:
>>>         # btrfs quota enable /mnt
>>>         # btrfs qgroup limit -e 10M /mnt
>>>         # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>>>         # sync
>>>         # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer    excl    max_rfer max_excl parent  child
>>> -------- ----    ----    -------- -------- ------  -----
>>> 0/5      9453568 9453568 0        10485760 ---     ---
>>>
>>> Reported-by: Cyril SCETBON<cyril.scetbon@free.fr>
>>> Signed-off-by: Dongsheng Yang<yangds.fnst@cn.fujitsu.com>
>>> ---
>>> Changelog:
>>>         v1 -> v2:
>>>                 Remove the redundant check for fs_info->quota_enabled.
>>>
>>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>>   fs/btrfs/qgroup.c      | 68
>>> +++++++++++++++++++++++++++++++++++++++++++++++---
>>>   fs/btrfs/qgroup.h      |  4 +++
>>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 014b7f2..f4ad737 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root
>>> *root,
>>>
>>>         set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>>                          bytenr + num_bytes - 1, GFP_NOFS |
>>> __GFP_NOFAIL);
>>> -       if (reserved)
>>> +       if (reserved) {
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  num_bytes, -1);
>>>                 trace_btrfs_reserved_extent_free(root, bytenr,
>>> num_bytes);
>>> +       }
>>>         return 0;
>>>   }
>>>
>>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct
>>> btrfs_trans_handle *trans,
>>>                 btrfs_update_reserved_bytes(cache, buf->len,
>>> RESERVE_FREE, 0);
>>>                 trace_btrfs_reserved_extent_free(root, buf->start,
>>> buf->len);
>>>                 pin = 0;
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  buf->len, -1);
>>>         }
>>>   out:
>>>         if (pin)
>>> @@ -6964,7 +6971,11 @@ static int __btrfs_free_reserved_extent(struct
>>> btrfs_root *root,
>>>         else {
>>>                 btrfs_add_free_space(cache, start, len);
>>>                 btrfs_update_reserved_bytes(cache, len, RESERVE_FREE,
>>> delalloc);
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  len, -1);
>>>         }
>>> +
>>>         btrfs_put_block_group(cache);
>>>
>>>         trace_btrfs_reserved_extent_free(root, start, len);
>>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct
>>> btrfs_trans_handle *trans,
>>>         BUG_ON(ret); /* logic error */
>>>         ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>>                                          0, owner, offset, ins, 1);
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root->root_key.objectid,
>>> +                                          ins->offset, 1);
>>>         btrfs_put_block_group(block_group);
>>>         return ret;
>>>   }
>>> @@ -7346,6 +7360,10 @@ struct extent_buffer
>>> *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>>                 return ERR_PTR(ret);
>>>         }
>>>
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root_objectid,
>>> +                                          ins.offset, 1);
>>> +
>>>         buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>>                                     blocksize, level);
>>>         BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index d23362f..0c824f3 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -59,6 +59,7 @@
>>>   #include "backref.h"
>>>   #include "hash.h"
>>>   #include "props.h"
>>> +#include "qgroup.h"
>>>
>>>   struct btrfs_iget_args {
>>>         struct btrfs_key *location;
>>> @@ -739,7 +740,9 @@ retry:
>>>                         }
>>>                         goto out_free;
>>>                 }
>>> -
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>>                 /*
>>>                  * here we're doing allocation and writeback of the
>>>                  * compressed pages
>>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode
>>> *inode,
>>>                 if (ret < 0)
>>>                         goto out_unlock;
>>>
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>> +
>>>                 em = alloc_extent_map();
>>>                 if (!em) {
>>>                         ret = -ENOMEM;
>>> @@ -6745,6 +6752,10 @@ static struct extent_map
>>> *btrfs_new_extent_direct(struct inode *inode,
>>>                 return ERR_PTR(ret);
>>>         }
>>>
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root->root_key.objectid,
>>> +                                          ins.offset, 1);
>>> +
>>>         return em;
>>>   }
>>>
>>> @@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct
>>> inode *inode, int mode,
>>>                                 btrfs_end_transaction(trans, root);
>>>                         break;
>>>                 }
>>> +
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>> +
>>>                 btrfs_drop_extent_cache(inode, cur_offset,
>>>                                         cur_offset + ins.offset -1, 0);
>>>
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index 48b60db..d147082 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>>         /*
>>>          * reservation tracking
>>>          */
>>> +       u64 may_use;
>>>         u64 reserved;
>>>
>>>         /*
>>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct
>>> btrfs_fs_info *fs_info,
>>>         WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>>         qgroup->excl += sign * oper->num_bytes;
>>>         qgroup->excl_cmpr += sign * oper->num_bytes;
>>> +       if (sign > 0)
>>> +               qgroup->reserved -= oper->num_bytes;
>>>
>>>         qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct
>>> btrfs_fs_info *fs_info,
>>>                 qgroup->excl += sign * oper->num_bytes;
>>>                 if (sign < 0)
>>>                         WARN_ON(qgroup->excl < oper->num_bytes);
>>> +               if (sign > 0)
>>> +                       qgroup->reserved -= oper->num_bytes;
>>>                 qgroup->excl_cmpr += sign * oper->num_bytes;
>>>                 qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -2359,6 +2364,61 @@ out:
>>>         return ret;
>>>   }
>>>
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                                           u64 ref_root,
>>> +                                           u64 num_bytes,
>>> +                                           int sign)
>>> +{
>>> +       struct btrfs_root *quota_root;
>>> +       struct btrfs_qgroup *qgroup;
>>> +       int ret = 0;
>>> +       struct ulist_node *unode;
>>> +       struct ulist_iterator uiter;
>>> +
>>> +       if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>>> +               return 0;
>>> +
>>> +       if (num_bytes == 0)
>>> +               return 0;
>>> +
>>> +       spin_lock(&fs_info->qgroup_lock);
>>> +       quota_root = fs_info->quota_root;
>>> +       if (!quota_root)
>>> +               goto out;
>>> +
>>> +       qgroup = find_qgroup_rb(fs_info, ref_root);
>>> +       if (!qgroup)
>>> +               goto out;
>>> +
>>> +       ulist_reinit(fs_info->qgroup_ulist);
>>> +       ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>>> +                       (uintptr_t)qgroup, GFP_ATOMIC);
>>> +       if (ret < 0)
>>> +               goto out;
>>> +
>>> +       ULIST_ITER_INIT(&uiter);
>>> +       while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>>> +               struct btrfs_qgroup *qg;
>>> +               struct btrfs_qgroup_list *glist;
>>> +
>>> +               qg = u64_to_ptr(unode->aux);
>>> +
>>> +               qg->reserved += sign * num_bytes;
>>> +
>>> +               list_for_each_entry(glist, &qg->groups, next_group) {
>>> +                       ret = ulist_add(fs_info->qgroup_ulist,
>>> +                                       glist->group->qgroupid,
>>> +                                       (uintptr_t)glist->group,
>>> GFP_ATOMIC);
>>> +                       if (ret < 0)
>>> +                               goto out;
>>> +               }
>>> +       }
>>> +
>>> +out:
>>> +       spin_unlock(&fs_info->qgroup_lock);
>>> +       return ret;
>>> +}
>>> +
>>>   /*
>>>    * reserve some space for a qgroup and all its parents. The reservation
>>> takes
>>>    * place with start_transaction or dealloc_reserve, similar to ENOSPC
>>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root,
>>> u64 num_bytes)
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>>                 if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>> -                   qg->reserved + (s64)qg->rfer + num_bytes >
>>> +                   qg->reserved + qg->may_use + (s64)qg->rfer +
>>> num_bytes >
>>>                     qg->max_rfer) {
>>>                         ret = -EDQUOT;
>>>                         goto out;
>>>                 }
>>>
>>>                 if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>> -                   qg->reserved + (s64)qg->excl + num_bytes >
>>> +                   qg->reserved + qg->may_use + (s64)qg->excl +
>>> num_bytes >
>>>                     qg->max_excl) {
>>>                         ret = -EDQUOT;
>>>                         goto out;
>>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root,
>>> u64 num_bytes)
>>>
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>> -               qg->reserved += num_bytes;
>>> +               qg->may_use += num_bytes;
>>>         }
>>>
>>>   out:
>>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64
>>> num_bytes)
>>>
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>> -               qg->reserved -= num_bytes;
>>> +               qg->may_use -= num_bytes;
>>>
>>>                 list_for_each_entry(glist, &qg->groups, next_group) {
>>>                         ret = ulist_add(fs_info->qgroup_ulist,
>>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>>> index 18cc68c..31de88e 100644
>>> --- a/fs/btrfs/qgroup.h
>>> +++ b/fs/btrfs/qgroup.h
>>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle
>>> *trans,
>>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>>                          struct btrfs_fs_info *fs_info, u64 srcid, u64
>>> objectid,
>>>                          struct btrfs_qgroup_inherit *inherit);
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                                      u64 ref_root,
>>> +                                      u64 num_bytes,
>>> +                                      int sign);
>>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>>
>>
>
> --
>
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 0/3] Btrfs: Enhancment for qgroup.
  2014-12-26  5:43                 ` Satoru Takeuchi
  2014-12-26  6:49                   ` Dongsheng Yang
  2014-12-28  2:03                   ` Dongsheng Yang
@ 2015-01-05  6:16                   ` Dongsheng Yang
  2015-01-05  6:16                     ` [PATCH v3 1/3] Btrfs: qgroup: free reserved in exceeding quota Dongsheng Yang
                                       ` (3 more replies)
  2 siblings, 4 replies; 20+ messages in thread
From: Dongsheng Yang @ 2015-01-05  6:16 UTC (permalink / raw)
  To: takeuchi_satoru, jbacik; +Cc: linux-btrfs, Dongsheng Yang

Hi Josef and others,

This patch set is about enhancing qgroup.

[1/3]: fix a bug about qgroup leak when we exceed quota limit,
	It is reviewd by Josef.
[2/3]: introduce a new accounter in qgroup to close a window where
	user will exceed the limit by qgroup. It "looks good" to Josef.
[3/3]: a new patch to fix a bug reported by Satoru.

BTW, I have some other plan about qgroup in my TODO list:

Kernel:
a). adjust the accounters in parent qgroup when we move
the child qgroup.
	Currently, when we move a qgroup, the parent qgroup
will not updated at the same time. This will cause some wrong
numbers in qgroup.

b). add a ioctl to show the qgroup info.
	Command "btrfs qgroup show" is showing the qgroup info
read from qgroup tree. But there is some information in memory
which is not synced into device. Then it will show some outdate
number.

c). limit and account size in 3 modes, data, metadata and both.
	qgroup is accounting the size both of data and metadata
togather, but to a user, the data size is the most useful to them.

d). remove a subvolume related qgroup when subvolume is deleted and
there is no other reference to it.

user-tool:
a). Add the unit of B/K/M/G to btrfs qgroup show.
b). get the information via ioctl rather than reading it from
btree. Will keep the old way as a fallback for compatiblity.

Any comment and sugguestion is welcome. :)

Yang

Dongsheng Yang (3):
  Btrfs: qgroup: free reserved in exceeding quota.
  Btrfs: qgroup: Introduce a may_use to account
    space_info->bytes_may_use.
  Btrfs: qgroup, Account data space in more proper timings.

 fs/btrfs/extent-tree.c | 41 +++++++++++++++++++++++-------
 fs/btrfs/file.c        |  9 -------
 fs/btrfs/inode.c       | 18 ++++++++++++-
 fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/qgroup.h      |  4 +++
 5 files changed, 117 insertions(+), 23 deletions(-)

-- 
1.8.4.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/3] Btrfs: qgroup: free reserved in exceeding quota.
  2015-01-05  6:16                   ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Dongsheng Yang
@ 2015-01-05  6:16                     ` Dongsheng Yang
  2015-01-05  6:16                     ` [PATCH v3 2/3] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Dongsheng Yang @ 2015-01-05  6:16 UTC (permalink / raw)
  To: takeuchi_satoru, jbacik; +Cc: linux-btrfs, Dongsheng Yang

When we exceed quota limit in writing, we will free
some reserved extent when we need to drop but not free
account in qgroup. It means, each time we exceed quota
in writing, there will be some remain space in qg->reserved
we can not use any more. If things go on like this, the
all space will be ate up.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/extent-tree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a80b971..88b4e32 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5275,8 +5275,11 @@ out_fail:
 			to_free = 0;
 	}
 	spin_unlock(&BTRFS_I(inode)->lock);
-	if (dropped)
+	if (dropped) {
+		if (root->fs_info->quota_enabled)
+			btrfs_qgroup_free(root, dropped * root->nodesize);
 		to_free += btrfs_calc_trans_metadata_size(root, dropped);
+	}
 
 	if (to_free) {
 		btrfs_block_rsv_release(root, block_rsv, to_free);
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 2/3] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.
  2015-01-05  6:16                   ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Dongsheng Yang
  2015-01-05  6:16                     ` [PATCH v3 1/3] Btrfs: qgroup: free reserved in exceeding quota Dongsheng Yang
@ 2015-01-05  6:16                     ` Dongsheng Yang
  2015-01-05  6:16                     ` [PATCH v3 3/3] Btrfs: qgroup, Account data space in more proper timings Dongsheng Yang
  2015-01-07  0:49                     ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Satoru Takeuchi
  3 siblings, 0 replies; 20+ messages in thread
From: Dongsheng Yang @ 2015-01-05  6:16 UTC (permalink / raw)
  To: takeuchi_satoru, jbacik; +Cc: linux-btrfs, Dongsheng Yang

Currently, for pre_alloc or delay_alloc, the bytes will be accounted
in space_info by the three guys.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
But on the other hand, in qgroup, there are only two counters to account the
bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
bytes in space_info->bytes_may_use and qg->excl accounts bytes in
space_info->used. So the bytes in space_info->reserved is not accounted
in qgroup. If so, there is a window we can exceed the quota limit when
bytes is in space_info->reserved.

Example:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer     excl     max_rfer max_excl parent  child
-------- ----     ----     -------- -------- ------  -----
0/5      20987904 20987904 0        10485760 ---     ---

qg->excl is 20987904 larger than max_excl 10485760.

This patch introduce a new counter named may_use to qgroup, then
there are three counters in qgroup to account bytes in space_info
as below.
space_info->bytes_may_use --- space_info->reserved --- space_info->used.
qgroup->may_use           --- qgroup->reserved     --- qgroup->excl

With this patch applied:
	# btrfs quota enable /mnt
	# btrfs qgroup limit -e 10M /mnt
	# for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
	# sync
	# btrfs qgroup show -pcre /mnt
qgroupid rfer    excl    max_rfer max_excl parent  child
-------- ----    ----    -------- -------- ------  -----
0/5      9453568 9453568 0        10485760 ---     ---

Reported-by: Cyril SCETBON <cyril.scetbon@free.fr>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 20 ++++++++++++++-
 fs/btrfs/inode.c       | 18 ++++++++++++-
 fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/qgroup.h      |  4 +++
 4 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 88b4e32..d1a7ce0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5512,8 +5512,12 @@ static int pin_down_extent(struct btrfs_root *root,
 
 	set_extent_dirty(root->fs_info->pinned_extents, bytenr,
 			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
-	if (reserved)
+	if (reserved) {
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   num_bytes, -1);
 		trace_btrfs_reserved_extent_free(root, bytenr, num_bytes);
+	}
 	return 0;
 }
 
@@ -6244,6 +6248,9 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		btrfs_update_reserved_bytes(cache, buf->len, RESERVE_FREE, 0);
 		trace_btrfs_reserved_extent_free(root, buf->start, buf->len);
 		pin = 0;
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   buf->len, -1);
 	}
 out:
 	if (pin)
@@ -6978,7 +6985,11 @@ static int __btrfs_free_reserved_extent(struct btrfs_root *root,
 	else {
 		btrfs_add_free_space(cache, start, len);
 		btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, delalloc);
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   len, -1);
 	}
+
 	btrfs_put_block_group(cache);
 
 	trace_btrfs_reserved_extent_free(root, start, len);
@@ -7214,6 +7225,9 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 	BUG_ON(ret); /* logic error */
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
 					 0, owner, offset, ins, 1);
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root->root_key.objectid,
+					   ins->offset, 1);
 	btrfs_put_block_group(block_group);
 	return ret;
 }
@@ -7360,6 +7374,10 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
 		return ERR_PTR(ret);
 	}
 
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root_objectid,
+					   ins.offset, 1);
+
 	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
 				    blocksize, level);
 	BUG_ON(IS_ERR(buf)); /* -ENOMEM */
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e687bb0..e350cd6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -59,6 +59,7 @@
 #include "backref.h"
 #include "hash.h"
 #include "props.h"
+#include "qgroup.h"
 
 struct btrfs_iget_args {
 	struct btrfs_key *location;
@@ -745,7 +746,9 @@ retry:
 			}
 			goto out_free;
 		}
-
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
 		/*
 		 * here we're doing allocation and writeback of the
 		 * compressed pages
@@ -970,6 +973,10 @@ static noinline int cow_file_range(struct inode *inode,
 		if (ret < 0)
 			goto out_unlock;
 
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
+
 		em = alloc_extent_map();
 		if (!em) {
 			ret = -ENOMEM;
@@ -6797,6 +6804,10 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
 		return ERR_PTR(ret);
 	}
 
+	btrfs_qgroup_update_reserved_bytes(root->fs_info,
+					   root->root_key.objectid,
+					   ins.offset, 1);
+
 	return em;
 }
 
@@ -9321,6 +9332,11 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 				btrfs_end_transaction(trans, root);
 			break;
 		}
+
+		btrfs_qgroup_update_reserved_bytes(root->fs_info,
+						   root->root_key.objectid,
+						   ins.offset, 1);
+
 		btrfs_drop_extent_cache(inode, cur_offset,
 					cur_offset + ins.offset -1, 0);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48b60db..d147082 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -72,6 +72,7 @@ struct btrfs_qgroup {
 	/*
 	 * reservation tracking
 	 */
+	u64 may_use;
 	u64 reserved;
 
 	/*
@@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 	WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
 	qgroup->excl += sign * oper->num_bytes;
 	qgroup->excl_cmpr += sign * oper->num_bytes;
+	if (sign > 0)
+		qgroup->reserved -= oper->num_bytes;
 
 	qgroup_dirty(fs_info, qgroup);
 
@@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 		qgroup->excl += sign * oper->num_bytes;
 		if (sign < 0)
 			WARN_ON(qgroup->excl < oper->num_bytes);
+		if (sign > 0)
+			qgroup->reserved -= oper->num_bytes;
 		qgroup->excl_cmpr += sign * oper->num_bytes;
 		qgroup_dirty(fs_info, qgroup);
 
@@ -2359,6 +2364,61 @@ out:
 	return ret;
 }
 
+int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
+					    u64 ref_root,
+					    u64 num_bytes,
+					    int sign)
+{
+	struct btrfs_root *quota_root;
+	struct btrfs_qgroup *qgroup;
+	int ret = 0;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+
+	if (!is_fstree(ref_root) || !fs_info->quota_enabled)
+		return 0;
+
+	if (num_bytes == 0)
+		return 0;
+
+	spin_lock(&fs_info->qgroup_lock);
+	quota_root = fs_info->quota_root;
+	if (!quota_root)
+		goto out;
+
+	qgroup = find_qgroup_rb(fs_info, ref_root);
+	if (!qgroup)
+		goto out;
+
+	ulist_reinit(fs_info->qgroup_ulist);
+	ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
+			(uintptr_t)qgroup, GFP_ATOMIC);
+	if (ret < 0)
+		goto out;
+
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
+		struct btrfs_qgroup *qg;
+		struct btrfs_qgroup_list *glist;
+
+		qg = u64_to_ptr(unode->aux);
+
+		qg->reserved += sign * num_bytes;
+
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ret = ulist_add(fs_info->qgroup_ulist,
+					glist->group->qgroupid,
+					(uintptr_t)glist->group, GFP_ATOMIC);
+			if (ret < 0)
+				goto out;
+		}
+	}
+
+out:
+	spin_unlock(&fs_info->qgroup_lock);
+	return ret;
+}
+
 /*
  * reserve some space for a qgroup and all its parents. The reservation takes
  * place with start_transaction or dealloc_reserve, similar to ENOSPC
@@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 		qg = u64_to_ptr(unode->aux);
 
 		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
-		    qg->reserved + (s64)qg->rfer + num_bytes >
+		    qg->reserved + qg->may_use + (s64)qg->rfer + num_bytes >
 		    qg->max_rfer) {
 			ret = -EDQUOT;
 			goto out;
 		}
 
 		if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
-		    qg->reserved + (s64)qg->excl + num_bytes >
+		    qg->reserved + qg->may_use + (s64)qg->excl + num_bytes >
 		    qg->max_excl) {
 			ret = -EDQUOT;
 			goto out;
@@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 
 		qg = u64_to_ptr(unode->aux);
 
-		qg->reserved += num_bytes;
+		qg->may_use += num_bytes;
 	}
 
 out:
@@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
 
 		qg = u64_to_ptr(unode->aux);
 
-		qg->reserved -= num_bytes;
+		qg->may_use -= num_bytes;
 
 		list_for_each_entry(glist, &qg->groups, next_group) {
 			ret = ulist_add(fs_info->qgroup_ulist,
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 18cc68c..31de88e 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
 			 struct btrfs_qgroup_inherit *inherit);
+int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
+				       u64 ref_root,
+				       u64 num_bytes,
+				       int sign);
 int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
 void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
 
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 3/3] Btrfs: qgroup, Account data space in more proper timings.
  2015-01-05  6:16                   ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Dongsheng Yang
  2015-01-05  6:16                     ` [PATCH v3 1/3] Btrfs: qgroup: free reserved in exceeding quota Dongsheng Yang
  2015-01-05  6:16                     ` [PATCH v3 2/3] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
@ 2015-01-05  6:16                     ` Dongsheng Yang
  2015-01-07  0:49                     ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Satoru Takeuchi
  3 siblings, 0 replies; 20+ messages in thread
From: Dongsheng Yang @ 2015-01-05  6:16 UTC (permalink / raw)
  To: takeuchi_satoru, jbacik; +Cc: linux-btrfs, Dongsheng Yang

Currenly, in data writing, ->reserved is accounted in
fill_delalloc(), but ->may_use is released in clear_bit_hook()
which is called by btrfs_finish_ordered_io(). That's too late,
that said, between fill_delalloc() and btrfs_finish_ordered_io(),
the data is doublely accounted by qgroup. It will cause some
unexpected -EDQUOT.

Example:
	# btrfs quota enable /root/btrfs-auto-test/
	# btrfs subvolume create /root/btrfs-auto-test//sub
	Create subvolume '/root/btrfs-auto-test/sub'
	# btrfs qgroup limit 1G /root/btrfs-auto-test//sub
	dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
	dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
	681353+0 records in
	681352+0 records out
	697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
It's (698 MB) when we got an -EDQUOT, but we limit it by 1G.

This patch move the btrfs_qgroup_reserve/free() for data from
btrfs_delalloc_reserve/release_metadata() to btrfs_check_data_free_space()
and btrfs_free_reserved_data_space(). Then the accounter in qgroup
will be updated at the same time with the accounter in space_info updated.
In this way, the unexpected -EDQUOT will be killed.

Reported-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 16 +++++++++-------
 fs/btrfs/file.c        |  9 ---------
 2 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d1a7ce0..67c2e28 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3774,12 +3774,16 @@ commit_trans:
 					      data_sinfo->flags, bytes, 1);
 		return -ENOSPC;
 	}
+	ret = btrfs_qgroup_reserve(root, bytes);
+	if (ret)
+		goto out;
 	data_sinfo->bytes_may_use += bytes;
 	trace_btrfs_space_reservation(root->fs_info, "space_info",
 				      data_sinfo->flags, bytes, 1);
+out:
 	spin_unlock(&data_sinfo->lock);
 
-	return 0;
+	return ret;
 }
 
 /*
@@ -3796,6 +3800,7 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
 	data_sinfo = root->fs_info->data_sinfo;
 	spin_lock(&data_sinfo->lock);
 	WARN_ON(data_sinfo->bytes_may_use < bytes);
+	btrfs_qgroup_free(root, bytes);
 	data_sinfo->bytes_may_use -= bytes;
 	trace_btrfs_space_reservation(root->fs_info, "space_info",
 				      data_sinfo->flags, bytes, 0);
@@ -5191,8 +5196,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
 	spin_unlock(&BTRFS_I(inode)->lock);
 
 	if (root->fs_info->quota_enabled) {
-		ret = btrfs_qgroup_reserve(root, num_bytes +
-					   nr_extents * root->nodesize);
+		ret = btrfs_qgroup_reserve(root, nr_extents * root->nodesize);
 		if (ret)
 			goto out_fail;
 	}
@@ -5200,8 +5204,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
 	ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush);
 	if (unlikely(ret)) {
 		if (root->fs_info->quota_enabled)
-			btrfs_qgroup_free(root, num_bytes +
-						nr_extents * root->nodesize);
+			btrfs_qgroup_free(root, nr_extents * root->nodesize);
 		goto out_fail;
 	}
 
@@ -5319,8 +5322,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
 	trace_btrfs_space_reservation(root->fs_info, "delalloc",
 				      btrfs_ino(inode), to_free, 0);
 	if (root->fs_info->quota_enabled) {
-		btrfs_qgroup_free(root, num_bytes +
-					dropped * root->nodesize);
+		btrfs_qgroup_free(root, dropped * root->nodesize);
 	}
 
 	btrfs_block_rsv_release(root, &root->fs_info->delalloc_block_rsv,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e409025..0ab1333 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2527,7 +2527,6 @@ static long btrfs_fallocate(struct file *file, int mode,
 {
 	struct inode *inode = file_inode(file);
 	struct extent_state *cached_state = NULL;
-	struct btrfs_root *root = BTRFS_I(inode)->root;
 	u64 cur_offset;
 	u64 last_byte;
 	u64 alloc_start;
@@ -2555,11 +2554,6 @@ static long btrfs_fallocate(struct file *file, int mode,
 	ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start);
 	if (ret)
 		return ret;
-	if (root->fs_info->quota_enabled) {
-		ret = btrfs_qgroup_reserve(root, alloc_end - alloc_start);
-		if (ret)
-			goto out_reserve_fail;
-	}
 
 	mutex_lock(&inode->i_mutex);
 	ret = inode_newsize_ok(inode, alloc_end);
@@ -2677,9 +2671,6 @@ static long btrfs_fallocate(struct file *file, int mode,
 			     &cached_state, GFP_NOFS);
 out:
 	mutex_unlock(&inode->i_mutex);
-	if (root->fs_info->quota_enabled)
-		btrfs_qgroup_free(root, alloc_end - alloc_start);
-out_reserve_fail:
 	/* Let go of our reservation. */
 	btrfs_free_reserved_data_space(inode, alloc_end - alloc_start);
 	return ret;
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 0/3] Btrfs: Enhancment for qgroup.
  2015-01-05  6:16                   ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Dongsheng Yang
                                       ` (2 preceding siblings ...)
  2015-01-05  6:16                     ` [PATCH v3 3/3] Btrfs: qgroup, Account data space in more proper timings Dongsheng Yang
@ 2015-01-07  0:49                     ` Satoru Takeuchi
  2015-01-08  3:53                       ` Dongsheng Yang
  3 siblings, 1 reply; 20+ messages in thread
From: Satoru Takeuchi @ 2015-01-07  0:49 UTC (permalink / raw)
  To: Dongsheng Yang, jbacik; +Cc: linux-btrfs

Hi Yang,

On 2015/01/05 15:16, Dongsheng Yang wrote:
> Hi Josef and others,
> 
> This patch set is about enhancing qgroup.
> 
> [1/3]: fix a bug about qgroup leak when we exceed quota limit,
> 	It is reviewd by Josef.
> [2/3]: introduce a new accounter in qgroup to close a window where
> 	user will exceed the limit by qgroup. It "looks good" to Josef.
> [3/3]: a new patch to fix a bug reported by Satoru.

I tested your the patchset v3. Although it's far better
than the patchset v2, there is still one problem in this patchset.
When I wrote 1.5GiB to a subvolume with 1.0 GiB limit,
1.0GiB - 139 block (in this case, 1KiB/block) was written.

I consider user should be able to write just 1.0GiB in this case.

* Test result

===============================================================================
+ mkfs.btrfs -f /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/vdb
        nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
+ mount /dev/vdb /root/btrfs-auto-test/
+ ret=0
+ btrfs quota enable /root/btrfs-auto-test/
+ btrfs subvolume create /root/btrfs-auto-test//sub
Create subvolume '/root/btrfs-auto-test/sub'
+ btrfs qgroup limit 1G /root/btrfs-auto-test//sub
+ dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
1048438+0 records in                    # Tried to write 1GiB - 138 KiB
1048437+0 records out                   # Succeeded to write 1GiB - 139 KiB
1073599488 bytes (1.1 GB) copied, 19.0247 s, 56.4 MB/s
===============================================================================

* note

I tried to run the reproducer five times and the result is
a bit different for each time.

=========================
#	Written
-------------------------
1	1GiB - 139 KiB
2	1GiB - 139 KiB
3	1GiB - 145 KiB
4	1GiB - 135 KiB
5	1GiB - 135 KiB
==========================

So I consider it's a problem comes from timing.

If I changed the block size from 1KiB to 1 MiB,
the difference in bytes got larger.

============================
#	Written
----------------------------
1	1GiB - 1 MiB
2	1GiB - 1 MiB
3	1GiB - 1 MiB
4	1GiB - 1 MiB
5	1GiB - 1 MiB
============================

Thanks,
Satoru

> 
> BTW, I have some other plan about qgroup in my TODO list:
> 
> Kernel:
> a). adjust the accounters in parent qgroup when we move
> the child qgroup.
> 	Currently, when we move a qgroup, the parent qgroup
> will not updated at the same time. This will cause some wrong
> numbers in qgroup.
> 
> b). add a ioctl to show the qgroup info.
> 	Command "btrfs qgroup show" is showing the qgroup info
> read from qgroup tree. But there is some information in memory
> which is not synced into device. Then it will show some outdate
> number.
> 
> c). limit and account size in 3 modes, data, metadata and both.
> 	qgroup is accounting the size both of data and metadata
> togather, but to a user, the data size is the most useful to them.
> 
> d). remove a subvolume related qgroup when subvolume is deleted and
> there is no other reference to it.
> 
> user-tool:
> a). Add the unit of B/K/M/G to btrfs qgroup show.
> b). get the information via ioctl rather than reading it from
> btree. Will keep the old way as a fallback for compatiblity.
> 
> Any comment and sugguestion is welcome. :)
> 
> Yang
> 
> Dongsheng Yang (3):
>    Btrfs: qgroup: free reserved in exceeding quota.
>    Btrfs: qgroup: Introduce a may_use to account
>      space_info->bytes_may_use.
>    Btrfs: qgroup, Account data space in more proper timings.
> 
>   fs/btrfs/extent-tree.c | 41 +++++++++++++++++++++++-------
>   fs/btrfs/file.c        |  9 -------
>   fs/btrfs/inode.c       | 18 ++++++++++++-
>   fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
>   fs/btrfs/qgroup.h      |  4 +++
>   5 files changed, 117 insertions(+), 23 deletions(-)
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 0/3] Btrfs: Enhancment for qgroup.
  2015-01-07  0:49                     ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Satoru Takeuchi
@ 2015-01-08  3:53                       ` Dongsheng Yang
  0 siblings, 0 replies; 20+ messages in thread
From: Dongsheng Yang @ 2015-01-08  3:53 UTC (permalink / raw)
  To: Satoru Takeuchi, jbacik; +Cc: linux-btrfs

On 01/07/2015 08:49 AM, Satoru Takeuchi wrote:
> Hi Yang,
>
> On 2015/01/05 15:16, Dongsheng Yang wrote:
>> Hi Josef and others,
>>
>> This patch set is about enhancing qgroup.
>>
>> [1/3]: fix a bug about qgroup leak when we exceed quota limit,
>> 	It is reviewd by Josef.
>> [2/3]: introduce a new accounter in qgroup to close a window where
>> 	user will exceed the limit by qgroup. It "looks good" to Josef.
>> [3/3]: a new patch to fix a bug reported by Satoru.
> I tested your the patchset v3. Although it's far better
> than the patchset v2, there is still one problem in this patchset.
> When I wrote 1.5GiB to a subvolume with 1.0 GiB limit,
> 1.0GiB - 139 block (in this case, 1KiB/block) was written.
>
> I consider user should be able to write just 1.0GiB in this case.

Hi Satoru,

Yes, Currently, user can not write 1.0GiB in this case. Because qgroup
is accounting data and metadata togather. And I have posted an idea
in this thread that split it into three modes, data, metadata and both.

TODO issues:
c). limit and account size in 3 modes, data, metadata and both.
	qgroup is accounting the size both of data and metadata
togather, but to users, the data size is the most useful to them.


But, you mentioned that the result is different in each time.
Hmmm.... there must be something wrong in it. I need some more
investigation to answer this question.

Thanx a lot for your test!
Yang
>
> * Test result
>
> ===============================================================================
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
> 1048438+0 records in                    # Tried to write 1GiB - 138 KiB
> 1048437+0 records out                   # Succeeded to write 1GiB - 139 KiB
> 1073599488 bytes (1.1 GB) copied, 19.0247 s, 56.4 MB/s
> ===============================================================================
>
> * note
>
> I tried to run the reproducer five times and the result is
> a bit different for each time.
>
> =========================
> #	Written
> -------------------------
> 1	1GiB - 139 KiB
> 2	1GiB - 139 KiB
> 3	1GiB - 145 KiB
> 4	1GiB - 135 KiB
> 5	1GiB - 135 KiB
> ==========================
>
> So I consider it's a problem comes from timing.
>
> If I changed the block size from 1KiB to 1 MiB,
> the difference in bytes got larger.
>
> ============================
> #	Written
> ----------------------------
> 1	1GiB - 1 MiB
> 2	1GiB - 1 MiB
> 3	1GiB - 1 MiB
> 4	1GiB - 1 MiB
> 5	1GiB - 1 MiB
> ============================
>
> Thanks,
> Satoru
>
>> BTW, I have some other plan about qgroup in my TODO list:
>>
>> Kernel:
>> a). adjust the accounters in parent qgroup when we move
>> the child qgroup.
>> 	Currently, when we move a qgroup, the parent qgroup
>> will not updated at the same time. This will cause some wrong
>> numbers in qgroup.
>>
>> b). add a ioctl to show the qgroup info.
>> 	Command "btrfs qgroup show" is showing the qgroup info
>> read from qgroup tree. But there is some information in memory
>> which is not synced into device. Then it will show some outdate
>> number.
>>
>> c). limit and account size in 3 modes, data, metadata and both.
>> 	qgroup is accounting the size both of data and metadata
>> togather, but to a user, the data size is the most useful to them.
>>
>> d). remove a subvolume related qgroup when subvolume is deleted and
>> there is no other reference to it.
>>
>> user-tool:
>> a). Add the unit of B/K/M/G to btrfs qgroup show.
>> b). get the information via ioctl rather than reading it from
>> btree. Will keep the old way as a fallback for compatiblity.
>>
>> Any comment and sugguestion is welcome. :)
>>
>> Yang
>>
>> Dongsheng Yang (3):
>>    Btrfs: qgroup: free reserved in exceeding quota.
>>    Btrfs: qgroup: Introduce a may_use to account
>>      space_info->bytes_may_use.
>>    Btrfs: qgroup, Account data space in more proper timings.
>>
>>   fs/btrfs/extent-tree.c | 41 +++++++++++++++++++++++-------
>>   fs/btrfs/file.c        |  9 -------
>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>   fs/btrfs/qgroup.c      | 68 +++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/btrfs/qgroup.h      |  4 +++
>>   5 files changed, 117 insertions(+), 23 deletions(-)
>>
> .
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-01-08  3:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <BA68E495-8B7B-4F0F-A64B-0413B1480B9C@free.fr>
2014-12-09 11:27 ` [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota Dongsheng Yang
2014-12-09 11:27   ` [PATCH 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
2014-12-09 15:55     ` Josef Bacik
2014-12-10  8:09       ` Dongsheng Yang
2014-12-09 15:42   ` [PATCH 1/2] Btrfs: qgroup: free reserved in exceeding quota Josef Bacik
2014-12-10  8:09     ` Dongsheng Yang
2014-12-11 18:25       ` Josef Bacik
2014-12-12  0:44         ` Dongsheng Yang
2014-12-12  8:44           ` [PATCH v2 " Dongsheng Yang
2014-12-12  8:44             ` [PATCH v2 2/2] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
     [not found]               ` <549CBAB1.3070909@cn.fujitsu.com>
2014-12-26  5:43                 ` Satoru Takeuchi
2014-12-26  6:49                   ` Dongsheng Yang
2014-12-26  7:00                     ` Dongsheng Yang
2014-12-28  2:03                   ` Dongsheng Yang
2015-01-05  6:16                   ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Dongsheng Yang
2015-01-05  6:16                     ` [PATCH v3 1/3] Btrfs: qgroup: free reserved in exceeding quota Dongsheng Yang
2015-01-05  6:16                     ` [PATCH v3 2/3] Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use Dongsheng Yang
2015-01-05  6:16                     ` [PATCH v3 3/3] Btrfs: qgroup, Account data space in more proper timings Dongsheng Yang
2015-01-07  0:49                     ` [PATCH v3 0/3] Btrfs: Enhancment for qgroup Satoru Takeuchi
2015-01-08  3:53                       ` Dongsheng Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).