linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
@ 2016-07-06 10:37 Wang Xiaoguang
  2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
  2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
  0 siblings, 2 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-06 10:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
which indeed are extent's bytenr. The correct value should be
cluster->[start|end] minus block group's start bytenr.

start bytenr   cluster->start
|              |     extent      |   extent   | ...| extent |
|----------------------------------------------------------------|
|                block group reloc_inode                         |

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/relocation.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..abc2f69 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3030,34 +3030,37 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	u64 num_bytes;
 	int nr = 0;
 	int ret = 0;
+	u64 prealloc_start, prealloc_end;
 
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	inode_lock(inode);
 
-	ret = btrfs_check_data_free_space(inode, cluster->start,
-					  cluster->end + 1 - cluster->start);
+	start = cluster->start - offset;
+	end = cluster->end - offset;
+	ret = btrfs_check_data_free_space(inode, start, end + 1 - start);
 	if (ret)
 		goto out;
 
 	while (nr < cluster->nr) {
-		start = cluster->boundary[nr] - offset;
+		prealloc_start = cluster->boundary[nr] - offset;
 		if (nr + 1 < cluster->nr)
-			end = cluster->boundary[nr + 1] - 1 - offset;
+			prealloc_end = cluster->boundary[nr + 1] - 1 - offset;
 		else
-			end = cluster->end - offset;
+			prealloc_end = cluster->end - offset;
 
-		lock_extent(&BTRFS_I(inode)->io_tree, start, end);
-		num_bytes = end + 1 - start;
-		ret = btrfs_prealloc_file_range(inode, 0, start,
+		lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
+			    prealloc_end);
+		num_bytes = prealloc_end + 1 - prealloc_start;
+		ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
 						num_bytes, num_bytes,
-						end + 1, &alloc_hint);
-		unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
+						prealloc_end + 1, &alloc_hint);
+		unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
+			      prealloc_end);
 		if (ret)
 			break;
 		nr++;
 	}
-	btrfs_free_reserved_data_space(inode, cluster->start,
-				       cluster->end + 1 - cluster->start);
+	btrfs_free_reserved_data_space(inode, start, end + 1 - start);
 out:
 	inode_unlock(inode);
 	return ret;
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
  2016-07-06 10:37 [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
@ 2016-07-06 10:37 ` Wang Xiaoguang
  2016-07-06 12:27   ` Holger Hoffstätte
  2016-07-11 11:34   ` Wang Xiaoguang
  2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
  1 sibling, 2 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-06 10:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Below test scripts can reproduce this false ENOSPC:
	#!/bin/bash
	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
	dev=$(losetup --show -f fs.img)
	mkfs.btrfs -f -M $dev
	mkdir /tmp/mntpoint
	mount /dev/loop0 /tmp/mntpoint
	cd mntpoint
	xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile

Above fallocate(2) operation will fail for ENOSPC reason, but indeed
fs still has free space to satisfy this request. The reason is
btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
just in time, and it calls btrfs_free_reserved_data_space_noquota() in
the end of btrfs_fallocate(), which is too late and have already added
false unnecessary pressure to enospc system. See call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
    It will add btrfs_space_info's bytes_may_use accordingly.
|-> btrfs_prealloc_file_range()
    It will call btrfs_reserve_extent(), but note that alloc type is
    RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
    only increase btrfs_space_info's bytes_reserved accordingly, but
    will not decrease btrfs_space_info's bytes_may_use, then obviously
    we have overestimated real needed disk space, and it'll impact
    other processes who do write(2) or fallocate(2) operations, also
    can impact metadata reservation in mixed mode, and bytes_max_use
    will only be decreased in the end of btrfs_fallocate(). To fix
    this false ENOSPC, we need to decrease btrfs_space_info's
    bytes_may_use in btrfs_prealloc_file_range() in time, as what we
    do in cow_file_range(),
    See call graph in :
    cow_file_range()
    |-> extent_clear_unlock_delalloc()
        |-> clear_extent_bit()
            |-> btrfs_clear_bit_hook()
                |-> btrfs_free_reserved_data_space_noquota()
                    This function will decrease bytes_may_use accordingly.

So this patch choose to call btrfs_free_reserved_data_space() in
__btrfs_prealloc_file_range() for both successful and failed path.

Also this patch removes some old and useless comments.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c |  1 -
 fs/btrfs/file.c        | 23 ++++++++++++-----------
 fs/btrfs/inode-map.c   |  3 +--
 fs/btrfs/inode.c       | 12 ++++++++++++
 fs/btrfs/relocation.c  | 10 +++++++++-
 5 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b912a..b0c86d2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3490,7 +3490,6 @@ again:
 		dcs = BTRFS_DC_SETUP;
 	else if (ret == -ENOSPC)
 		set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
-	btrfs_free_reserved_data_space(inode, 0, num_pages);
 
 out_put:
 	iput(inode);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2234e88..f872113 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 
 	alloc_start = round_down(offset, blocksize);
 	alloc_end = round_up(offset + len, blocksize);
+	cur_offset = alloc_start;
 
 	/* Make sure we aren't being give some crap mode */
 	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
@@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
 
 	/* First, check if we exceed the qgroup limit */
 	INIT_LIST_HEAD(&reserve_list);
-	cur_offset = alloc_start;
 	while (1) {
 		em = btrfs_get_extent(inode, NULL, 0, cur_offset,
 				      alloc_end - cur_offset, 0);
@@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
 					last_byte - cur_offset);
 			if (ret < 0)
 				break;
+		} else {
+			/*
+			 * Do not need to reserve unwritten extent for this
+			 * range, free reserved data space first, otherwise
+			 * it'll result false ENOSPC error.
+			 */
+			btrfs_free_reserved_data_space(inode, cur_offset,
+				last_byte - cur_offset);
 		}
 		free_extent_map(em);
 		cur_offset = last_byte;
@@ -2839,18 +2847,11 @@ out_unlock:
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
 			     &cached_state, GFP_KERNEL);
 out:
-	/*
-	 * As we waited the extent range, the data_rsv_map must be empty
-	 * in the range, as written data range will be released from it.
-	 * And for prealloacted extent, it will also be released when
-	 * its metadata is written.
-	 * So this is completely used as cleanup.
-	 */
-	btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
 	inode_unlock(inode);
 	/* Let go of our reservation. */
-	btrfs_free_reserved_data_space(inode, alloc_start,
-				       alloc_end - alloc_start);
+	if (ret != 0)
+		btrfs_free_reserved_data_space(inode, cur_offset,
+					       alloc_end - cur_offset);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 70107f7..e59e7d6 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -495,10 +495,9 @@ again:
 	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
 					      prealloc, prealloc, &alloc_hint);
 	if (ret) {
-		btrfs_delalloc_release_space(inode, 0, prealloc);
+		btrfs_delalloc_release_metadata(inode, prealloc);
 		goto out_put;
 	}
-	btrfs_free_reserved_data_space(inode, 0, prealloc);
 
 	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
 out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4421954..4dc7c838 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10269,6 +10269,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 	u64 last_alloc = (u64)-1;
 	int ret = 0;
 	bool own_trans = true;
+	u64 end = start + num_bytes - 1;
 
 	if (trans)
 		own_trans = false;
@@ -10347,6 +10348,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		}
 		free_extent_map(em);
 next:
+		btrfs_free_reserved_data_space(inode, cur_offset, ins.offset);
 		num_bytes -= ins.offset;
 		cur_offset += ins.offset;
 		*alloc_hint = ins.objectid + ins.offset;
@@ -10377,9 +10379,18 @@ next:
 		if (own_trans)
 			btrfs_end_transaction(trans, root);
 	}
+
+	if (cur_offset < end)
+		btrfs_free_reserved_data_space(inode, cur_offset,
+			end - cur_offset + 1);
 	return ret;
 }
 
+/*
+ * __btrfs_prealloc_file_range() will call btrfs_free_reserved_data_space()
+ * internally for both sucessful and failed path, btrfs_prealloc_file_range()'s
+ * callers does not need to call btrfs_free_reserved_data_space() any more.
+ */
 int btrfs_prealloc_file_range(struct inode *inode, int mode,
 			      u64 start, u64 num_bytes, u64 min_size,
 			      loff_t actual_len, u64 *alloc_hint)
@@ -10389,6 +10400,7 @@ int btrfs_prealloc_file_range(struct inode *inode, int mode,
 					   NULL);
 }
 
+/* Please see same comments in btrfs_prealloc_file_range() */
 int btrfs_prealloc_file_range_trans(struct inode *inode,
 				    struct btrfs_trans_handle *trans, int mode,
 				    u64 start, u64 num_bytes, u64 min_size,
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index abc2f69..70756fd 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3031,6 +3031,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	int nr = 0;
 	int ret = 0;
 	u64 prealloc_start, prealloc_end;
+	u64 cur_offset;
 
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	inode_lock(inode);
@@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	if (ret)
 		goto out;
 
+	cur_offset = start;
 	while (nr < cluster->nr) {
 		prealloc_start = cluster->boundary[nr] - offset;
 		if (nr + 1 < cluster->nr)
@@ -3051,16 +3053,22 @@ int prealloc_file_extent_cluster(struct inode *inode,
 		lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
 			    prealloc_end);
 		num_bytes = prealloc_end + 1 - prealloc_start;
+		if (cur_offset < start)
+			btrfs_free_reserved_data_space(inode, cur_offset,
+				start - cur_offset);
 		ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
 						num_bytes, num_bytes,
 						prealloc_end + 1, &alloc_hint);
 		unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
 			      prealloc_end);
+		cur_offset = prealloc_end + 1;
 		if (ret)
 			break;
 		nr++;
 	}
-	btrfs_free_reserved_data_space(inode, start, end + 1 - start);
+	if (cur_offset < end)
+		btrfs_free_reserved_data_space(inode, cur_offset,
+				end + 1 - cur_offset);
 out:
 	inode_unlock(inode);
 	return ret;
-- 
2.9.0




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
  2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
@ 2016-07-06 12:27   ` Holger Hoffstätte
  2016-07-07  2:27     ` Wang Xiaoguang
  2016-07-11 11:34   ` Wang Xiaoguang
  1 sibling, 1 reply; 7+ messages in thread
From: Holger Hoffstätte @ 2016-07-06 12:27 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba

On 07/06/16 12:37, Wang Xiaoguang wrote:
> Below test scripts can reproduce this false ENOSPC:
> 	#!/bin/bash
> 	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> 	dev=$(losetup --show -f fs.img)
> 	mkfs.btrfs -f -M $dev
> 	mkdir /tmp/mntpoint
> 	mount /dev/loop0 /tmp/mntpoint
> 	cd mntpoint
> 	xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile
> 
> Above fallocate(2) operation will fail for ENOSPC reason, but indeed
> fs still has free space to satisfy this request. The reason is
> btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
> just in time, and it calls btrfs_free_reserved_data_space_noquota() in
> the end of btrfs_fallocate(), which is too late and have already added
> false unnecessary pressure to enospc system. See call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
>     It will add btrfs_space_info's bytes_may_use accordingly.
> |-> btrfs_prealloc_file_range()
>     It will call btrfs_reserve_extent(), but note that alloc type is
>     RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
>     only increase btrfs_space_info's bytes_reserved accordingly, but
>     will not decrease btrfs_space_info's bytes_may_use, then obviously
>     we have overestimated real needed disk space, and it'll impact
>     other processes who do write(2) or fallocate(2) operations, also
>     can impact metadata reservation in mixed mode, and bytes_max_use
>     will only be decreased in the end of btrfs_fallocate(). To fix
>     this false ENOSPC, we need to decrease btrfs_space_info's
>     bytes_may_use in btrfs_prealloc_file_range() in time, as what we
>     do in cow_file_range(),
>     See call graph in :
>     cow_file_range()
>     |-> extent_clear_unlock_delalloc()
>         |-> clear_extent_bit()
>             |-> btrfs_clear_bit_hook()
>                 |-> btrfs_free_reserved_data_space_noquota()
>                     This function will decrease bytes_may_use accordingly.
> 
> So this patch choose to call btrfs_free_reserved_data_space() in
> __btrfs_prealloc_file_range() for both successful and failed path.
> 
> Also this patch removes some old and useless comments.
> 
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Verified that the reproducer script indeed fails (with btrfs ~4.7) and
the patch (on top of 1/2) fixes it. Also ran a bunch of other fallocating
things without problem. Free space also still seems sane, as far as I
could tell.

So for both patches:

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>

cheers,
Holger


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
  2016-07-06 10:37 [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
  2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
@ 2016-07-06 19:54 ` Liu Bo
  2016-07-07  2:26   ` Wang Xiaoguang
  1 sibling, 1 reply; 7+ messages in thread
From: Liu Bo @ 2016-07-06 19:54 UTC (permalink / raw)
  To: Wang Xiaoguang; +Cc: linux-btrfs, dsterba

On Wed, Jul 06, 2016 at 06:37:52PM +0800, Wang Xiaoguang wrote:
> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
> wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
> which indeed are extent's bytenr. The correct value should be
> cluster->[start|end] minus block group's start bytenr.
> 
> start bytenr   cluster->start
> |              |     extent      |   extent   | ...| extent |
> |----------------------------------------------------------------|
> |                block group reloc_inode                         |
> 
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/relocation.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0477dca..abc2f69 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3030,34 +3030,37 @@ int prealloc_file_extent_cluster(struct inode *inode,
>  	u64 num_bytes;
>  	int nr = 0;
>  	int ret = 0;
> +	u64 prealloc_start, prealloc_end;
>  
>  	BUG_ON(cluster->start != cluster->boundary[0]);
>  	inode_lock(inode);
>  
> -	ret = btrfs_check_data_free_space(inode, cluster->start,
> -					  cluster->end + 1 - cluster->start);
> +	start = cluster->start - offset;
> +	end = cluster->end - offset;
> +	ret = btrfs_check_data_free_space(inode, start, end + 1 - start);
>  	if (ret)
>  		goto out;
>  
>  	while (nr < cluster->nr) {
> -		start = cluster->boundary[nr] - offset;
> +		prealloc_start = cluster->boundary[nr] - offset;
>  		if (nr + 1 < cluster->nr)
> -			end = cluster->boundary[nr + 1] - 1 - offset;
> +			prealloc_end = cluster->boundary[nr + 1] - 1 - offset;
>  		else
> -			end = cluster->end - offset;
> +			prealloc_end = cluster->end - offset;
>  
> -		lock_extent(&BTRFS_I(inode)->io_tree, start, end);
> -		num_bytes = end + 1 - start;
> -		ret = btrfs_prealloc_file_range(inode, 0, start,
> +		lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
> +			    prealloc_end);
> +		num_bytes = prealloc_end + 1 - prealloc_start;
> +		ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
>  						num_bytes, num_bytes,
> -						end + 1, &alloc_hint);
> -		unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
> +						prealloc_end + 1, &alloc_hint);
> +		unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
> +			      prealloc_end);

Changing names is unnecessary, we can pick up other names for btrfs_{check/free}_data_free_space().

Thanks,

-liubo

>  		if (ret)
>  			break;
>  		nr++;
>  	}
> -	btrfs_free_reserved_data_space(inode, cluster->start,
> -				       cluster->end + 1 - cluster->start);
> +	btrfs_free_reserved_data_space(inode, start, end + 1 - start);
>  out:
>  	inode_unlock(inode);
>  	return ret;
> -- 
> 2.9.0
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
  2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
@ 2016-07-07  2:26   ` Wang Xiaoguang
  0 siblings, 0 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-07  2:26 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs, dsterba

hello,

On 07/07/2016 03:54 AM, Liu Bo wrote:
> On Wed, Jul 06, 2016 at 06:37:52PM +0800, Wang Xiaoguang wrote:
>> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
>> wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
>> which indeed are extent's bytenr. The correct value should be
>> cluster->[start|end] minus block group's start bytenr.
>>
>> start bytenr   cluster->start
>> |              |     extent      |   extent   | ...| extent |
>> |----------------------------------------------------------------|
>> |                block group reloc_inode                         |
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>   fs/btrfs/relocation.c | 27 +++++++++++++++------------
>>   1 file changed, 15 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 0477dca..abc2f69 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -3030,34 +3030,37 @@ int prealloc_file_extent_cluster(struct inode *inode,
>>   	u64 num_bytes;
>>   	int nr = 0;
>>   	int ret = 0;
>> +	u64 prealloc_start, prealloc_end;
>>   
>>   	BUG_ON(cluster->start != cluster->boundary[0]);
>>   	inode_lock(inode);
>>   
>> -	ret = btrfs_check_data_free_space(inode, cluster->start,
>> -					  cluster->end + 1 - cluster->start);
>> +	start = cluster->start - offset;
>> +	end = cluster->end - offset;
>> +	ret = btrfs_check_data_free_space(inode, start, end + 1 - start);
>>   	if (ret)
>>   		goto out;
>>   
>>   	while (nr < cluster->nr) {
>> -		start = cluster->boundary[nr] - offset;
>> +		prealloc_start = cluster->boundary[nr] - offset;
>>   		if (nr + 1 < cluster->nr)
>> -			end = cluster->boundary[nr + 1] - 1 - offset;
>> +			prealloc_end = cluster->boundary[nr + 1] - 1 - offset;
>>   		else
>> -			end = cluster->end - offset;
>> +			prealloc_end = cluster->end - offset;
>>   
>> -		lock_extent(&BTRFS_I(inode)->io_tree, start, end);
>> -		num_bytes = end + 1 - start;
>> -		ret = btrfs_prealloc_file_range(inode, 0, start,
>> +		lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
>> +			    prealloc_end);
>> +		num_bytes = prealloc_end + 1 - prealloc_start;
>> +		ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
>>   						num_bytes, num_bytes,
>> -						end + 1, &alloc_hint);
>> -		unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
>> +						prealloc_end + 1, &alloc_hint);
>> +		unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
>> +			      prealloc_end);
> Changing names is unnecessary, we can pick up other names for btrfs_{check/free}_data_free_space().
OK, then the changes will be small, thanks.

Regards,
Xiaoguang Wang

>
> Thanks,
>
> -liubo
>
>>   		if (ret)
>>   			break;
>>   		nr++;
>>   	}
>> -	btrfs_free_reserved_data_space(inode, cluster->start,
>> -				       cluster->end + 1 - cluster->start);
>> +	btrfs_free_reserved_data_space(inode, start, end + 1 - start);
>>   out:
>>   	inode_unlock(inode);
>>   	return ret;
>> -- 
>> 2.9.0
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
  2016-07-06 12:27   ` Holger Hoffstätte
@ 2016-07-07  2:27     ` Wang Xiaoguang
  0 siblings, 0 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-07  2:27 UTC (permalink / raw)
  To: Holger Hoffstätte, linux-btrfs; +Cc: dsterba

hello,

On 07/06/2016 08:27 PM, Holger Hoffstätte wrote:
> On 07/06/16 12:37, Wang Xiaoguang wrote:
>> Below test scripts can reproduce this false ENOSPC:
>> 	#!/bin/bash
>> 	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
>> 	dev=$(losetup --show -f fs.img)
>> 	mkfs.btrfs -f -M $dev
>> 	mkdir /tmp/mntpoint
>> 	mount /dev/loop0 /tmp/mntpoint
>> 	cd mntpoint
>> 	xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile
>>
>> Above fallocate(2) operation will fail for ENOSPC reason, but indeed
>> fs still has free space to satisfy this request. The reason is
>> btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
>> just in time, and it calls btrfs_free_reserved_data_space_noquota() in
>> the end of btrfs_fallocate(), which is too late and have already added
>> false unnecessary pressure to enospc system. See call graph:
>> btrfs_fallocate()
>> |-> btrfs_alloc_data_chunk_ondemand()
>>      It will add btrfs_space_info's bytes_may_use accordingly.
>> |-> btrfs_prealloc_file_range()
>>      It will call btrfs_reserve_extent(), but note that alloc type is
>>      RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
>>      only increase btrfs_space_info's bytes_reserved accordingly, but
>>      will not decrease btrfs_space_info's bytes_may_use, then obviously
>>      we have overestimated real needed disk space, and it'll impact
>>      other processes who do write(2) or fallocate(2) operations, also
>>      can impact metadata reservation in mixed mode, and bytes_max_use
>>      will only be decreased in the end of btrfs_fallocate(). To fix
>>      this false ENOSPC, we need to decrease btrfs_space_info's
>>      bytes_may_use in btrfs_prealloc_file_range() in time, as what we
>>      do in cow_file_range(),
>>      See call graph in :
>>      cow_file_range()
>>      |-> extent_clear_unlock_delalloc()
>>          |-> clear_extent_bit()
>>              |-> btrfs_clear_bit_hook()
>>                  |-> btrfs_free_reserved_data_space_noquota()
>>                      This function will decrease bytes_may_use accordingly.
>>
>> So this patch choose to call btrfs_free_reserved_data_space() in
>> __btrfs_prealloc_file_range() for both successful and failed path.
>>
>> Also this patch removes some old and useless comments.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> Verified that the reproducer script indeed fails (with btrfs ~4.7) and
> the patch (on top of 1/2) fixes it. Also ran a bunch of other fallocating
> things without problem. Free space also still seems sane, as far as I
> could tell.
>
> So for both patches:
>
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Thanks very much :)

Regards,
Xiaoguang Wang

>
> cheers,
> Holger
>
>
>




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
  2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
  2016-07-06 12:27   ` Holger Hoffstätte
@ 2016-07-11 11:34   ` Wang Xiaoguang
  1 sibling, 0 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-11 11:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

hello,

Please ignore this patch, though this patch is correct to me, and pass
the fstests test. I have prepared a new common patch to fix this false
ENOSPC bug. Currently I'm doing fstests test, and will sent them
tomorrow, thanks.

Regards,
Xiaoguang Wang

On 07/06/2016 06:37 PM, Wang Xiaoguang wrote:
> Below test scripts can reproduce this false ENOSPC:
> 	#!/bin/bash
> 	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> 	dev=$(losetup --show -f fs.img)
> 	mkfs.btrfs -f -M $dev
> 	mkdir /tmp/mntpoint
> 	mount /dev/loop0 /tmp/mntpoint
> 	cd mntpoint
> 	xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile
>
> Above fallocate(2) operation will fail for ENOSPC reason, but indeed
> fs still has free space to satisfy this request. The reason is
> btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
> just in time, and it calls btrfs_free_reserved_data_space_noquota() in
> the end of btrfs_fallocate(), which is too late and have already added
> false unnecessary pressure to enospc system. See call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
>      It will add btrfs_space_info's bytes_may_use accordingly.
> |-> btrfs_prealloc_file_range()
>      It will call btrfs_reserve_extent(), but note that alloc type is
>      RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
>      only increase btrfs_space_info's bytes_reserved accordingly, but
>      will not decrease btrfs_space_info's bytes_may_use, then obviously
>      we have overestimated real needed disk space, and it'll impact
>      other processes who do write(2) or fallocate(2) operations, also
>      can impact metadata reservation in mixed mode, and bytes_max_use
>      will only be decreased in the end of btrfs_fallocate(). To fix
>      this false ENOSPC, we need to decrease btrfs_space_info's
>      bytes_may_use in btrfs_prealloc_file_range() in time, as what we
>      do in cow_file_range(),
>      See call graph in :
>      cow_file_range()
>      |-> extent_clear_unlock_delalloc()
>          |-> clear_extent_bit()
>              |-> btrfs_clear_bit_hook()
>                  |-> btrfs_free_reserved_data_space_noquota()
>                      This function will decrease bytes_may_use accordingly.
>
> So this patch choose to call btrfs_free_reserved_data_space() in
> __btrfs_prealloc_file_range() for both successful and failed path.
>
> Also this patch removes some old and useless comments.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>   fs/btrfs/extent-tree.c |  1 -
>   fs/btrfs/file.c        | 23 ++++++++++++-----------
>   fs/btrfs/inode-map.c   |  3 +--
>   fs/btrfs/inode.c       | 12 ++++++++++++
>   fs/btrfs/relocation.c  | 10 +++++++++-
>   5 files changed, 34 insertions(+), 15 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 82b912a..b0c86d2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3490,7 +3490,6 @@ again:
>   		dcs = BTRFS_DC_SETUP;
>   	else if (ret == -ENOSPC)
>   		set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
> -	btrfs_free_reserved_data_space(inode, 0, num_pages);
>   
>   out_put:
>   	iput(inode);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 2234e88..f872113 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>   
>   	alloc_start = round_down(offset, blocksize);
>   	alloc_end = round_up(offset + len, blocksize);
> +	cur_offset = alloc_start;
>   
>   	/* Make sure we aren't being give some crap mode */
>   	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> @@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
>   
>   	/* First, check if we exceed the qgroup limit */
>   	INIT_LIST_HEAD(&reserve_list);
> -	cur_offset = alloc_start;
>   	while (1) {
>   		em = btrfs_get_extent(inode, NULL, 0, cur_offset,
>   				      alloc_end - cur_offset, 0);
> @@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
>   					last_byte - cur_offset);
>   			if (ret < 0)
>   				break;
> +		} else {
> +			/*
> +			 * Do not need to reserve unwritten extent for this
> +			 * range, free reserved data space first, otherwise
> +			 * it'll result false ENOSPC error.
> +			 */
> +			btrfs_free_reserved_data_space(inode, cur_offset,
> +				last_byte - cur_offset);
>   		}
>   		free_extent_map(em);
>   		cur_offset = last_byte;
> @@ -2839,18 +2847,11 @@ out_unlock:
>   	unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
>   			     &cached_state, GFP_KERNEL);
>   out:
> -	/*
> -	 * As we waited the extent range, the data_rsv_map must be empty
> -	 * in the range, as written data range will be released from it.
> -	 * And for prealloacted extent, it will also be released when
> -	 * its metadata is written.
> -	 * So this is completely used as cleanup.
> -	 */
> -	btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
>   	inode_unlock(inode);
>   	/* Let go of our reservation. */
> -	btrfs_free_reserved_data_space(inode, alloc_start,
> -				       alloc_end - alloc_start);
> +	if (ret != 0)
> +		btrfs_free_reserved_data_space(inode, cur_offset,
> +					       alloc_end - cur_offset);
>   	return ret;
>   }
>   
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index 70107f7..e59e7d6 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -495,10 +495,9 @@ again:
>   	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
>   					      prealloc, prealloc, &alloc_hint);
>   	if (ret) {
> -		btrfs_delalloc_release_space(inode, 0, prealloc);
> +		btrfs_delalloc_release_metadata(inode, prealloc);
>   		goto out_put;
>   	}
> -	btrfs_free_reserved_data_space(inode, 0, prealloc);
>   
>   	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
>   out_put:
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4421954..4dc7c838 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -10269,6 +10269,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>   	u64 last_alloc = (u64)-1;
>   	int ret = 0;
>   	bool own_trans = true;
> +	u64 end = start + num_bytes - 1;
>   
>   	if (trans)
>   		own_trans = false;
> @@ -10347,6 +10348,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
>   		}
>   		free_extent_map(em);
>   next:
> +		btrfs_free_reserved_data_space(inode, cur_offset, ins.offset);
>   		num_bytes -= ins.offset;
>   		cur_offset += ins.offset;
>   		*alloc_hint = ins.objectid + ins.offset;
> @@ -10377,9 +10379,18 @@ next:
>   		if (own_trans)
>   			btrfs_end_transaction(trans, root);
>   	}
> +
> +	if (cur_offset < end)
> +		btrfs_free_reserved_data_space(inode, cur_offset,
> +			end - cur_offset + 1);
>   	return ret;
>   }
>   
> +/*
> + * __btrfs_prealloc_file_range() will call btrfs_free_reserved_data_space()
> + * internally for both sucessful and failed path, btrfs_prealloc_file_range()'s
> + * callers does not need to call btrfs_free_reserved_data_space() any more.
> + */
>   int btrfs_prealloc_file_range(struct inode *inode, int mode,
>   			      u64 start, u64 num_bytes, u64 min_size,
>   			      loff_t actual_len, u64 *alloc_hint)
> @@ -10389,6 +10400,7 @@ int btrfs_prealloc_file_range(struct inode *inode, int mode,
>   					   NULL);
>   }
>   
> +/* Please see same comments in btrfs_prealloc_file_range() */
>   int btrfs_prealloc_file_range_trans(struct inode *inode,
>   				    struct btrfs_trans_handle *trans, int mode,
>   				    u64 start, u64 num_bytes, u64 min_size,
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index abc2f69..70756fd 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3031,6 +3031,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
>   	int nr = 0;
>   	int ret = 0;
>   	u64 prealloc_start, prealloc_end;
> +	u64 cur_offset;
>   
>   	BUG_ON(cluster->start != cluster->boundary[0]);
>   	inode_lock(inode);
> @@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
>   	if (ret)
>   		goto out;
>   
> +	cur_offset = start;
>   	while (nr < cluster->nr) {
>   		prealloc_start = cluster->boundary[nr] - offset;
>   		if (nr + 1 < cluster->nr)
> @@ -3051,16 +3053,22 @@ int prealloc_file_extent_cluster(struct inode *inode,
>   		lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
>   			    prealloc_end);
>   		num_bytes = prealloc_end + 1 - prealloc_start;
> +		if (cur_offset < start)
> +			btrfs_free_reserved_data_space(inode, cur_offset,
> +				start - cur_offset);
>   		ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
>   						num_bytes, num_bytes,
>   						prealloc_end + 1, &alloc_hint);
>   		unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
>   			      prealloc_end);
> +		cur_offset = prealloc_end + 1;
>   		if (ret)
>   			break;
>   		nr++;
>   	}
> -	btrfs_free_reserved_data_space(inode, start, end + 1 - start);
> +	if (cur_offset < end)
> +		btrfs_free_reserved_data_space(inode, cur_offset,
> +				end + 1 - cur_offset);
>   out:
>   	inode_unlock(inode);
>   	return ret;




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-11 11:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-06 10:37 [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
2016-07-06 12:27   ` Holger Hoffstätte
2016-07-07  2:27     ` Wang Xiaoguang
2016-07-11 11:34   ` Wang Xiaoguang
2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
2016-07-07  2:26   ` Wang Xiaoguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).