* [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
@ 2016-07-06 10:37 Wang Xiaoguang
2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
0 siblings, 2 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-06 10:37 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba
In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
which indeed are extent's bytenr. The correct value should be
cluster->[start|end] minus block group's start bytenr.
start bytenr cluster->start
| | extent | extent | ...| extent |
|----------------------------------------------------------------|
| block group reloc_inode |
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
fs/btrfs/relocation.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..abc2f69 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3030,34 +3030,37 @@ int prealloc_file_extent_cluster(struct inode *inode,
u64 num_bytes;
int nr = 0;
int ret = 0;
+ u64 prealloc_start, prealloc_end;
BUG_ON(cluster->start != cluster->boundary[0]);
inode_lock(inode);
- ret = btrfs_check_data_free_space(inode, cluster->start,
- cluster->end + 1 - cluster->start);
+ start = cluster->start - offset;
+ end = cluster->end - offset;
+ ret = btrfs_check_data_free_space(inode, start, end + 1 - start);
if (ret)
goto out;
while (nr < cluster->nr) {
- start = cluster->boundary[nr] - offset;
+ prealloc_start = cluster->boundary[nr] - offset;
if (nr + 1 < cluster->nr)
- end = cluster->boundary[nr + 1] - 1 - offset;
+ prealloc_end = cluster->boundary[nr + 1] - 1 - offset;
else
- end = cluster->end - offset;
+ prealloc_end = cluster->end - offset;
- lock_extent(&BTRFS_I(inode)->io_tree, start, end);
- num_bytes = end + 1 - start;
- ret = btrfs_prealloc_file_range(inode, 0, start,
+ lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
+ prealloc_end);
+ num_bytes = prealloc_end + 1 - prealloc_start;
+ ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
num_bytes, num_bytes,
- end + 1, &alloc_hint);
- unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
+ prealloc_end + 1, &alloc_hint);
+ unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
+ prealloc_end);
if (ret)
break;
nr++;
}
- btrfs_free_reserved_data_space(inode, cluster->start,
- cluster->end + 1 - cluster->start);
+ btrfs_free_reserved_data_space(inode, start, end + 1 - start);
out:
inode_unlock(inode);
return ret;
--
2.9.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
2016-07-06 10:37 [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
@ 2016-07-06 10:37 ` Wang Xiaoguang
2016-07-06 12:27 ` Holger Hoffstätte
2016-07-11 11:34 ` Wang Xiaoguang
2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
1 sibling, 2 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-06 10:37 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba
Below test scripts can reproduce this false ENOSPC:
#!/bin/bash
dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
dev=$(losetup --show -f fs.img)
mkfs.btrfs -f -M $dev
mkdir /tmp/mntpoint
mount /dev/loop0 /tmp/mntpoint
cd mntpoint
xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile
Above fallocate(2) operation will fail for ENOSPC reason, but indeed
fs still has free space to satisfy this request. The reason is
btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
just in time, and it calls btrfs_free_reserved_data_space_noquota() in
the end of btrfs_fallocate(), which is too late and have already added
false unnecessary pressure to enospc system. See call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
It will add btrfs_space_info's bytes_may_use accordingly.
|-> btrfs_prealloc_file_range()
It will call btrfs_reserve_extent(), but note that alloc type is
RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
only increase btrfs_space_info's bytes_reserved accordingly, but
will not decrease btrfs_space_info's bytes_may_use, then obviously
we have overestimated real needed disk space, and it'll impact
other processes who do write(2) or fallocate(2) operations, also
can impact metadata reservation in mixed mode, and bytes_max_use
will only be decreased in the end of btrfs_fallocate(). To fix
this false ENOSPC, we need to decrease btrfs_space_info's
bytes_may_use in btrfs_prealloc_file_range() in time, as what we
do in cow_file_range(),
See call graph in :
cow_file_range()
|-> extent_clear_unlock_delalloc()
|-> clear_extent_bit()
|-> btrfs_clear_bit_hook()
|-> btrfs_free_reserved_data_space_noquota()
This function will decrease bytes_may_use accordingly.
So this patch choose to call btrfs_free_reserved_data_space() in
__btrfs_prealloc_file_range() for both successful and failed path.
Also this patch removes some old and useless comments.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
fs/btrfs/extent-tree.c | 1 -
fs/btrfs/file.c | 23 ++++++++++++-----------
fs/btrfs/inode-map.c | 3 +--
fs/btrfs/inode.c | 12 ++++++++++++
fs/btrfs/relocation.c | 10 +++++++++-
5 files changed, 34 insertions(+), 15 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b912a..b0c86d2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3490,7 +3490,6 @@ again:
dcs = BTRFS_DC_SETUP;
else if (ret == -ENOSPC)
set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
- btrfs_free_reserved_data_space(inode, 0, num_pages);
out_put:
iput(inode);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 2234e88..f872113 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
alloc_start = round_down(offset, blocksize);
alloc_end = round_up(offset + len, blocksize);
+ cur_offset = alloc_start;
/* Make sure we aren't being give some crap mode */
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
@@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
/* First, check if we exceed the qgroup limit */
INIT_LIST_HEAD(&reserve_list);
- cur_offset = alloc_start;
while (1) {
em = btrfs_get_extent(inode, NULL, 0, cur_offset,
alloc_end - cur_offset, 0);
@@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
last_byte - cur_offset);
if (ret < 0)
break;
+ } else {
+ /*
+ * Do not need to reserve unwritten extent for this
+ * range, free reserved data space first, otherwise
+ * it'll result false ENOSPC error.
+ */
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ last_byte - cur_offset);
}
free_extent_map(em);
cur_offset = last_byte;
@@ -2839,18 +2847,11 @@ out_unlock:
unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
&cached_state, GFP_KERNEL);
out:
- /*
- * As we waited the extent range, the data_rsv_map must be empty
- * in the range, as written data range will be released from it.
- * And for prealloacted extent, it will also be released when
- * its metadata is written.
- * So this is completely used as cleanup.
- */
- btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
inode_unlock(inode);
/* Let go of our reservation. */
- btrfs_free_reserved_data_space(inode, alloc_start,
- alloc_end - alloc_start);
+ if (ret != 0)
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ alloc_end - cur_offset);
return ret;
}
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 70107f7..e59e7d6 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -495,10 +495,9 @@ again:
ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
prealloc, prealloc, &alloc_hint);
if (ret) {
- btrfs_delalloc_release_space(inode, 0, prealloc);
+ btrfs_delalloc_release_metadata(inode, prealloc);
goto out_put;
}
- btrfs_free_reserved_data_space(inode, 0, prealloc);
ret = btrfs_write_out_ino_cache(root, trans, path, inode);
out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4421954..4dc7c838 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10269,6 +10269,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
u64 last_alloc = (u64)-1;
int ret = 0;
bool own_trans = true;
+ u64 end = start + num_bytes - 1;
if (trans)
own_trans = false;
@@ -10347,6 +10348,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
}
free_extent_map(em);
next:
+ btrfs_free_reserved_data_space(inode, cur_offset, ins.offset);
num_bytes -= ins.offset;
cur_offset += ins.offset;
*alloc_hint = ins.objectid + ins.offset;
@@ -10377,9 +10379,18 @@ next:
if (own_trans)
btrfs_end_transaction(trans, root);
}
+
+ if (cur_offset < end)
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ end - cur_offset + 1);
return ret;
}
+/*
+ * __btrfs_prealloc_file_range() will call btrfs_free_reserved_data_space()
+ * internally for both sucessful and failed path, btrfs_prealloc_file_range()'s
+ * callers does not need to call btrfs_free_reserved_data_space() any more.
+ */
int btrfs_prealloc_file_range(struct inode *inode, int mode,
u64 start, u64 num_bytes, u64 min_size,
loff_t actual_len, u64 *alloc_hint)
@@ -10389,6 +10400,7 @@ int btrfs_prealloc_file_range(struct inode *inode, int mode,
NULL);
}
+/* Please see same comments in btrfs_prealloc_file_range() */
int btrfs_prealloc_file_range_trans(struct inode *inode,
struct btrfs_trans_handle *trans, int mode,
u64 start, u64 num_bytes, u64 min_size,
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index abc2f69..70756fd 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3031,6 +3031,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
int nr = 0;
int ret = 0;
u64 prealloc_start, prealloc_end;
+ u64 cur_offset;
BUG_ON(cluster->start != cluster->boundary[0]);
inode_lock(inode);
@@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
if (ret)
goto out;
+ cur_offset = start;
while (nr < cluster->nr) {
prealloc_start = cluster->boundary[nr] - offset;
if (nr + 1 < cluster->nr)
@@ -3051,16 +3053,22 @@ int prealloc_file_extent_cluster(struct inode *inode,
lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
prealloc_end);
num_bytes = prealloc_end + 1 - prealloc_start;
+ if (cur_offset < start)
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ start - cur_offset);
ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
num_bytes, num_bytes,
prealloc_end + 1, &alloc_hint);
unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
prealloc_end);
+ cur_offset = prealloc_end + 1;
if (ret)
break;
nr++;
}
- btrfs_free_reserved_data_space(inode, start, end + 1 - start);
+ if (cur_offset < end)
+ btrfs_free_reserved_data_space(inode, cur_offset,
+ end + 1 - cur_offset);
out:
inode_unlock(inode);
return ret;
--
2.9.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
@ 2016-07-06 12:27 ` Holger Hoffstätte
2016-07-07 2:27 ` Wang Xiaoguang
2016-07-11 11:34 ` Wang Xiaoguang
1 sibling, 1 reply; 7+ messages in thread
From: Holger Hoffstätte @ 2016-07-06 12:27 UTC (permalink / raw)
To: Wang Xiaoguang, linux-btrfs; +Cc: dsterba
On 07/06/16 12:37, Wang Xiaoguang wrote:
> Below test scripts can reproduce this false ENOSPC:
> #!/bin/bash
> dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> dev=$(losetup --show -f fs.img)
> mkfs.btrfs -f -M $dev
> mkdir /tmp/mntpoint
> mount /dev/loop0 /tmp/mntpoint
> cd mntpoint
> xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile
>
> Above fallocate(2) operation will fail for ENOSPC reason, but indeed
> fs still has free space to satisfy this request. The reason is
> btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
> just in time, and it calls btrfs_free_reserved_data_space_noquota() in
> the end of btrfs_fallocate(), which is too late and have already added
> false unnecessary pressure to enospc system. See call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
> It will add btrfs_space_info's bytes_may_use accordingly.
> |-> btrfs_prealloc_file_range()
> It will call btrfs_reserve_extent(), but note that alloc type is
> RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
> only increase btrfs_space_info's bytes_reserved accordingly, but
> will not decrease btrfs_space_info's bytes_may_use, then obviously
> we have overestimated real needed disk space, and it'll impact
> other processes who do write(2) or fallocate(2) operations, also
> can impact metadata reservation in mixed mode, and bytes_max_use
> will only be decreased in the end of btrfs_fallocate(). To fix
> this false ENOSPC, we need to decrease btrfs_space_info's
> bytes_may_use in btrfs_prealloc_file_range() in time, as what we
> do in cow_file_range(),
> See call graph in :
> cow_file_range()
> |-> extent_clear_unlock_delalloc()
> |-> clear_extent_bit()
> |-> btrfs_clear_bit_hook()
> |-> btrfs_free_reserved_data_space_noquota()
> This function will decrease bytes_may_use accordingly.
>
> So this patch choose to call btrfs_free_reserved_data_space() in
> __btrfs_prealloc_file_range() for both successful and failed path.
>
> Also this patch removes some old and useless comments.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Verified that the reproducer script indeed fails (with btrfs ~4.7) and
the patch (on top of 1/2) fixes it. Also ran a bunch of other fallocating
things without problem. Free space also still seems sane, as far as I
could tell.
So for both patches:
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
cheers,
Holger
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
2016-07-06 10:37 [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
@ 2016-07-06 19:54 ` Liu Bo
2016-07-07 2:26 ` Wang Xiaoguang
1 sibling, 1 reply; 7+ messages in thread
From: Liu Bo @ 2016-07-06 19:54 UTC (permalink / raw)
To: Wang Xiaoguang; +Cc: linux-btrfs, dsterba
On Wed, Jul 06, 2016 at 06:37:52PM +0800, Wang Xiaoguang wrote:
> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
> wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
> which indeed are extent's bytenr. The correct value should be
> cluster->[start|end] minus block group's start bytenr.
>
> start bytenr cluster->start
> | | extent | extent | ...| extent |
> |----------------------------------------------------------------|
> | block group reloc_inode |
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
> fs/btrfs/relocation.c | 27 +++++++++++++++------------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0477dca..abc2f69 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3030,34 +3030,37 @@ int prealloc_file_extent_cluster(struct inode *inode,
> u64 num_bytes;
> int nr = 0;
> int ret = 0;
> + u64 prealloc_start, prealloc_end;
>
> BUG_ON(cluster->start != cluster->boundary[0]);
> inode_lock(inode);
>
> - ret = btrfs_check_data_free_space(inode, cluster->start,
> - cluster->end + 1 - cluster->start);
> + start = cluster->start - offset;
> + end = cluster->end - offset;
> + ret = btrfs_check_data_free_space(inode, start, end + 1 - start);
> if (ret)
> goto out;
>
> while (nr < cluster->nr) {
> - start = cluster->boundary[nr] - offset;
> + prealloc_start = cluster->boundary[nr] - offset;
> if (nr + 1 < cluster->nr)
> - end = cluster->boundary[nr + 1] - 1 - offset;
> + prealloc_end = cluster->boundary[nr + 1] - 1 - offset;
> else
> - end = cluster->end - offset;
> + prealloc_end = cluster->end - offset;
>
> - lock_extent(&BTRFS_I(inode)->io_tree, start, end);
> - num_bytes = end + 1 - start;
> - ret = btrfs_prealloc_file_range(inode, 0, start,
> + lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
> + prealloc_end);
> + num_bytes = prealloc_end + 1 - prealloc_start;
> + ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
> num_bytes, num_bytes,
> - end + 1, &alloc_hint);
> - unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
> + prealloc_end + 1, &alloc_hint);
> + unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
> + prealloc_end);
Changing names is unnecessary, we can pick up other names for btrfs_{check/free}_data_free_space().
Thanks,
-liubo
> if (ret)
> break;
> nr++;
> }
> - btrfs_free_reserved_data_space(inode, cluster->start,
> - cluster->end + 1 - cluster->start);
> + btrfs_free_reserved_data_space(inode, start, end + 1 - start);
> out:
> inode_unlock(inode);
> return ret;
> --
> 2.9.0
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
@ 2016-07-07 2:26 ` Wang Xiaoguang
0 siblings, 0 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-07 2:26 UTC (permalink / raw)
To: bo.li.liu; +Cc: linux-btrfs, dsterba
hello,
On 07/07/2016 03:54 AM, Liu Bo wrote:
> On Wed, Jul 06, 2016 at 06:37:52PM +0800, Wang Xiaoguang wrote:
>> In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
>> wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
>> which indeed are extent's bytenr. The correct value should be
>> cluster->[start|end] minus block group's start bytenr.
>>
>> start bytenr cluster->start
>> | | extent | extent | ...| extent |
>> |----------------------------------------------------------------|
>> | block group reloc_inode |
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>> fs/btrfs/relocation.c | 27 +++++++++++++++------------
>> 1 file changed, 15 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 0477dca..abc2f69 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -3030,34 +3030,37 @@ int prealloc_file_extent_cluster(struct inode *inode,
>> u64 num_bytes;
>> int nr = 0;
>> int ret = 0;
>> + u64 prealloc_start, prealloc_end;
>>
>> BUG_ON(cluster->start != cluster->boundary[0]);
>> inode_lock(inode);
>>
>> - ret = btrfs_check_data_free_space(inode, cluster->start,
>> - cluster->end + 1 - cluster->start);
>> + start = cluster->start - offset;
>> + end = cluster->end - offset;
>> + ret = btrfs_check_data_free_space(inode, start, end + 1 - start);
>> if (ret)
>> goto out;
>>
>> while (nr < cluster->nr) {
>> - start = cluster->boundary[nr] - offset;
>> + prealloc_start = cluster->boundary[nr] - offset;
>> if (nr + 1 < cluster->nr)
>> - end = cluster->boundary[nr + 1] - 1 - offset;
>> + prealloc_end = cluster->boundary[nr + 1] - 1 - offset;
>> else
>> - end = cluster->end - offset;
>> + prealloc_end = cluster->end - offset;
>>
>> - lock_extent(&BTRFS_I(inode)->io_tree, start, end);
>> - num_bytes = end + 1 - start;
>> - ret = btrfs_prealloc_file_range(inode, 0, start,
>> + lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
>> + prealloc_end);
>> + num_bytes = prealloc_end + 1 - prealloc_start;
>> + ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
>> num_bytes, num_bytes,
>> - end + 1, &alloc_hint);
>> - unlock_extent(&BTRFS_I(inode)->io_tree, start, end);
>> + prealloc_end + 1, &alloc_hint);
>> + unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
>> + prealloc_end);
> Changing names is unnecessary, we can pick up other names for btrfs_{check/free}_data_free_space().
OK, then the changes will be small, thanks.
Regards,
Xiaoguang Wang
>
> Thanks,
>
> -liubo
>
>> if (ret)
>> break;
>> nr++;
>> }
>> - btrfs_free_reserved_data_space(inode, cluster->start,
>> - cluster->end + 1 - cluster->start);
>> + btrfs_free_reserved_data_space(inode, start, end + 1 - start);
>> out:
>> inode_unlock(inode);
>> return ret;
>> --
>> 2.9.0
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
2016-07-06 12:27 ` Holger Hoffstätte
@ 2016-07-07 2:27 ` Wang Xiaoguang
0 siblings, 0 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-07 2:27 UTC (permalink / raw)
To: Holger Hoffstätte, linux-btrfs; +Cc: dsterba
hello,
On 07/06/2016 08:27 PM, Holger Hoffstätte wrote:
> On 07/06/16 12:37, Wang Xiaoguang wrote:
>> Below test scripts can reproduce this false ENOSPC:
>> #!/bin/bash
>> dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
>> dev=$(losetup --show -f fs.img)
>> mkfs.btrfs -f -M $dev
>> mkdir /tmp/mntpoint
>> mount /dev/loop0 /tmp/mntpoint
>> cd mntpoint
>> xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile
>>
>> Above fallocate(2) operation will fail for ENOSPC reason, but indeed
>> fs still has free space to satisfy this request. The reason is
>> btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
>> just in time, and it calls btrfs_free_reserved_data_space_noquota() in
>> the end of btrfs_fallocate(), which is too late and have already added
>> false unnecessary pressure to enospc system. See call graph:
>> btrfs_fallocate()
>> |-> btrfs_alloc_data_chunk_ondemand()
>> It will add btrfs_space_info's bytes_may_use accordingly.
>> |-> btrfs_prealloc_file_range()
>> It will call btrfs_reserve_extent(), but note that alloc type is
>> RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
>> only increase btrfs_space_info's bytes_reserved accordingly, but
>> will not decrease btrfs_space_info's bytes_may_use, then obviously
>> we have overestimated real needed disk space, and it'll impact
>> other processes who do write(2) or fallocate(2) operations, also
>> can impact metadata reservation in mixed mode, and bytes_max_use
>> will only be decreased in the end of btrfs_fallocate(). To fix
>> this false ENOSPC, we need to decrease btrfs_space_info's
>> bytes_may_use in btrfs_prealloc_file_range() in time, as what we
>> do in cow_file_range(),
>> See call graph in :
>> cow_file_range()
>> |-> extent_clear_unlock_delalloc()
>> |-> clear_extent_bit()
>> |-> btrfs_clear_bit_hook()
>> |-> btrfs_free_reserved_data_space_noquota()
>> This function will decrease bytes_may_use accordingly.
>>
>> So this patch choose to call btrfs_free_reserved_data_space() in
>> __btrfs_prealloc_file_range() for both successful and failed path.
>>
>> Also this patch removes some old and useless comments.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> Verified that the reproducer script indeed fails (with btrfs ~4.7) and
> the patch (on top of 1/2) fixes it. Also ran a bunch of other fallocating
> things without problem. Free space also still seems sane, as far as I
> could tell.
>
> So for both patches:
>
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Thanks very much :)
Regards,
Xiaoguang Wang
>
> cheers,
> Holger
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
2016-07-06 12:27 ` Holger Hoffstätte
@ 2016-07-11 11:34 ` Wang Xiaoguang
1 sibling, 0 replies; 7+ messages in thread
From: Wang Xiaoguang @ 2016-07-11 11:34 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba
hello,
Please ignore this patch, though this patch is correct to me, and pass
the fstests test. I have prepared a new common patch to fix this false
ENOSPC bug. Currently I'm doing fstests test, and will sent them
tomorrow, thanks.
Regards,
Xiaoguang Wang
On 07/06/2016 06:37 PM, Wang Xiaoguang wrote:
> Below test scripts can reproduce this false ENOSPC:
> #!/bin/bash
> dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> dev=$(losetup --show -f fs.img)
> mkfs.btrfs -f -M $dev
> mkdir /tmp/mntpoint
> mount /dev/loop0 /tmp/mntpoint
> cd mntpoint
> xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile
>
> Above fallocate(2) operation will fail for ENOSPC reason, but indeed
> fs still has free space to satisfy this request. The reason is
> btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
> just in time, and it calls btrfs_free_reserved_data_space_noquota() in
> the end of btrfs_fallocate(), which is too late and have already added
> false unnecessary pressure to enospc system. See call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
> It will add btrfs_space_info's bytes_may_use accordingly.
> |-> btrfs_prealloc_file_range()
> It will call btrfs_reserve_extent(), but note that alloc type is
> RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
> only increase btrfs_space_info's bytes_reserved accordingly, but
> will not decrease btrfs_space_info's bytes_may_use, then obviously
> we have overestimated real needed disk space, and it'll impact
> other processes who do write(2) or fallocate(2) operations, also
> can impact metadata reservation in mixed mode, and bytes_max_use
> will only be decreased in the end of btrfs_fallocate(). To fix
> this false ENOSPC, we need to decrease btrfs_space_info's
> bytes_may_use in btrfs_prealloc_file_range() in time, as what we
> do in cow_file_range(),
> See call graph in :
> cow_file_range()
> |-> extent_clear_unlock_delalloc()
> |-> clear_extent_bit()
> |-> btrfs_clear_bit_hook()
> |-> btrfs_free_reserved_data_space_noquota()
> This function will decrease bytes_may_use accordingly.
>
> So this patch choose to call btrfs_free_reserved_data_space() in
> __btrfs_prealloc_file_range() for both successful and failed path.
>
> Also this patch removes some old and useless comments.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
> fs/btrfs/extent-tree.c | 1 -
> fs/btrfs/file.c | 23 ++++++++++++-----------
> fs/btrfs/inode-map.c | 3 +--
> fs/btrfs/inode.c | 12 ++++++++++++
> fs/btrfs/relocation.c | 10 +++++++++-
> 5 files changed, 34 insertions(+), 15 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 82b912a..b0c86d2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3490,7 +3490,6 @@ again:
> dcs = BTRFS_DC_SETUP;
> else if (ret == -ENOSPC)
> set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
> - btrfs_free_reserved_data_space(inode, 0, num_pages);
>
> out_put:
> iput(inode);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 2234e88..f872113 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>
> alloc_start = round_down(offset, blocksize);
> alloc_end = round_up(offset + len, blocksize);
> + cur_offset = alloc_start;
>
> /* Make sure we aren't being give some crap mode */
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> @@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
>
> /* First, check if we exceed the qgroup limit */
> INIT_LIST_HEAD(&reserve_list);
> - cur_offset = alloc_start;
> while (1) {
> em = btrfs_get_extent(inode, NULL, 0, cur_offset,
> alloc_end - cur_offset, 0);
> @@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
> last_byte - cur_offset);
> if (ret < 0)
> break;
> + } else {
> + /*
> + * Do not need to reserve unwritten extent for this
> + * range, free reserved data space first, otherwise
> + * it'll result false ENOSPC error.
> + */
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + last_byte - cur_offset);
> }
> free_extent_map(em);
> cur_offset = last_byte;
> @@ -2839,18 +2847,11 @@ out_unlock:
> unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
> &cached_state, GFP_KERNEL);
> out:
> - /*
> - * As we waited the extent range, the data_rsv_map must be empty
> - * in the range, as written data range will be released from it.
> - * And for prealloacted extent, it will also be released when
> - * its metadata is written.
> - * So this is completely used as cleanup.
> - */
> - btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
> inode_unlock(inode);
> /* Let go of our reservation. */
> - btrfs_free_reserved_data_space(inode, alloc_start,
> - alloc_end - alloc_start);
> + if (ret != 0)
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + alloc_end - cur_offset);
> return ret;
> }
>
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index 70107f7..e59e7d6 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -495,10 +495,9 @@ again:
> ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
> prealloc, prealloc, &alloc_hint);
> if (ret) {
> - btrfs_delalloc_release_space(inode, 0, prealloc);
> + btrfs_delalloc_release_metadata(inode, prealloc);
> goto out_put;
> }
> - btrfs_free_reserved_data_space(inode, 0, prealloc);
>
> ret = btrfs_write_out_ino_cache(root, trans, path, inode);
> out_put:
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4421954..4dc7c838 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -10269,6 +10269,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
> u64 last_alloc = (u64)-1;
> int ret = 0;
> bool own_trans = true;
> + u64 end = start + num_bytes - 1;
>
> if (trans)
> own_trans = false;
> @@ -10347,6 +10348,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
> }
> free_extent_map(em);
> next:
> + btrfs_free_reserved_data_space(inode, cur_offset, ins.offset);
> num_bytes -= ins.offset;
> cur_offset += ins.offset;
> *alloc_hint = ins.objectid + ins.offset;
> @@ -10377,9 +10379,18 @@ next:
> if (own_trans)
> btrfs_end_transaction(trans, root);
> }
> +
> + if (cur_offset < end)
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + end - cur_offset + 1);
> return ret;
> }
>
> +/*
> + * __btrfs_prealloc_file_range() will call btrfs_free_reserved_data_space()
> + * internally for both sucessful and failed path, btrfs_prealloc_file_range()'s
> + * callers does not need to call btrfs_free_reserved_data_space() any more.
> + */
> int btrfs_prealloc_file_range(struct inode *inode, int mode,
> u64 start, u64 num_bytes, u64 min_size,
> loff_t actual_len, u64 *alloc_hint)
> @@ -10389,6 +10400,7 @@ int btrfs_prealloc_file_range(struct inode *inode, int mode,
> NULL);
> }
>
> +/* Please see same comments in btrfs_prealloc_file_range() */
> int btrfs_prealloc_file_range_trans(struct inode *inode,
> struct btrfs_trans_handle *trans, int mode,
> u64 start, u64 num_bytes, u64 min_size,
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index abc2f69..70756fd 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3031,6 +3031,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
> int nr = 0;
> int ret = 0;
> u64 prealloc_start, prealloc_end;
> + u64 cur_offset;
>
> BUG_ON(cluster->start != cluster->boundary[0]);
> inode_lock(inode);
> @@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
> if (ret)
> goto out;
>
> + cur_offset = start;
> while (nr < cluster->nr) {
> prealloc_start = cluster->boundary[nr] - offset;
> if (nr + 1 < cluster->nr)
> @@ -3051,16 +3053,22 @@ int prealloc_file_extent_cluster(struct inode *inode,
> lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
> prealloc_end);
> num_bytes = prealloc_end + 1 - prealloc_start;
> + if (cur_offset < start)
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + start - cur_offset);
> ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
> num_bytes, num_bytes,
> prealloc_end + 1, &alloc_hint);
> unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
> prealloc_end);
> + cur_offset = prealloc_end + 1;
> if (ret)
> break;
> nr++;
> }
> - btrfs_free_reserved_data_space(inode, start, end + 1 - start);
> + if (cur_offset < end)
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + end + 1 - cur_offset);
> out:
> inode_unlock(inode);
> return ret;
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-07-11 11:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-06 10:37 [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
2016-07-06 12:27 ` Holger Hoffstätte
2016-07-07 2:27 ` Wang Xiaoguang
2016-07-11 11:34 ` Wang Xiaoguang
2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
2016-07-07 2:26 ` Wang Xiaoguang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).