From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
To: <linux-btrfs@vger.kernel.org>
Cc: <dsterba@suse.cz>
Subject: Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
Date: Mon, 11 Jul 2016 19:34:18 +0800 [thread overview]
Message-ID: <5783843A.5080809@cn.fujitsu.com> (raw)
In-Reply-To: <20160706103753.4908-2-wangxg.fnst@cn.fujitsu.com>
hello,
Please ignore this patch, though this patch is correct to me, and pass
the fstests test. I have prepared a new common patch to fix this false
ENOSPC bug. Currently I'm doing fstests test, and will sent them
tomorrow, thanks.
Regards,
Xiaoguang Wang
On 07/06/2016 06:37 PM, Wang Xiaoguang wrote:
> Below test scripts can reproduce this false ENOSPC:
> #!/bin/bash
> dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
> dev=$(losetup --show -f fs.img)
> mkfs.btrfs -f -M $dev
> mkdir /tmp/mntpoint
> mount /dev/loop0 /tmp/mntpoint
> cd mntpoint
> xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile
>
> Above fallocate(2) operation will fail for ENOSPC reason, but indeed
> fs still has free space to satisfy this request. The reason is
> btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use
> just in time, and it calls btrfs_free_reserved_data_space_noquota() in
> the end of btrfs_fallocate(), which is too late and have already added
> false unnecessary pressure to enospc system. See call graph:
> btrfs_fallocate()
> |-> btrfs_alloc_data_chunk_ondemand()
> It will add btrfs_space_info's bytes_may_use accordingly.
> |-> btrfs_prealloc_file_range()
> It will call btrfs_reserve_extent(), but note that alloc type is
> RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will
> only increase btrfs_space_info's bytes_reserved accordingly, but
> will not decrease btrfs_space_info's bytes_may_use, then obviously
> we have overestimated real needed disk space, and it'll impact
> other processes who do write(2) or fallocate(2) operations, also
> can impact metadata reservation in mixed mode, and bytes_max_use
> will only be decreased in the end of btrfs_fallocate(). To fix
> this false ENOSPC, we need to decrease btrfs_space_info's
> bytes_may_use in btrfs_prealloc_file_range() in time, as what we
> do in cow_file_range(),
> See call graph in :
> cow_file_range()
> |-> extent_clear_unlock_delalloc()
> |-> clear_extent_bit()
> |-> btrfs_clear_bit_hook()
> |-> btrfs_free_reserved_data_space_noquota()
> This function will decrease bytes_may_use accordingly.
>
> So this patch choose to call btrfs_free_reserved_data_space() in
> __btrfs_prealloc_file_range() for both successful and failed path.
>
> Also this patch removes some old and useless comments.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
> fs/btrfs/extent-tree.c | 1 -
> fs/btrfs/file.c | 23 ++++++++++++-----------
> fs/btrfs/inode-map.c | 3 +--
> fs/btrfs/inode.c | 12 ++++++++++++
> fs/btrfs/relocation.c | 10 +++++++++-
> 5 files changed, 34 insertions(+), 15 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 82b912a..b0c86d2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3490,7 +3490,6 @@ again:
> dcs = BTRFS_DC_SETUP;
> else if (ret == -ENOSPC)
> set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
> - btrfs_free_reserved_data_space(inode, 0, num_pages);
>
> out_put:
> iput(inode);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 2234e88..f872113 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>
> alloc_start = round_down(offset, blocksize);
> alloc_end = round_up(offset + len, blocksize);
> + cur_offset = alloc_start;
>
> /* Make sure we aren't being give some crap mode */
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> @@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode,
>
> /* First, check if we exceed the qgroup limit */
> INIT_LIST_HEAD(&reserve_list);
> - cur_offset = alloc_start;
> while (1) {
> em = btrfs_get_extent(inode, NULL, 0, cur_offset,
> alloc_end - cur_offset, 0);
> @@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode,
> last_byte - cur_offset);
> if (ret < 0)
> break;
> + } else {
> + /*
> + * Do not need to reserve unwritten extent for this
> + * range, free reserved data space first, otherwise
> + * it'll result false ENOSPC error.
> + */
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + last_byte - cur_offset);
> }
> free_extent_map(em);
> cur_offset = last_byte;
> @@ -2839,18 +2847,11 @@ out_unlock:
> unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
> &cached_state, GFP_KERNEL);
> out:
> - /*
> - * As we waited the extent range, the data_rsv_map must be empty
> - * in the range, as written data range will be released from it.
> - * And for prealloacted extent, it will also be released when
> - * its metadata is written.
> - * So this is completely used as cleanup.
> - */
> - btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
> inode_unlock(inode);
> /* Let go of our reservation. */
> - btrfs_free_reserved_data_space(inode, alloc_start,
> - alloc_end - alloc_start);
> + if (ret != 0)
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + alloc_end - cur_offset);
> return ret;
> }
>
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index 70107f7..e59e7d6 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -495,10 +495,9 @@ again:
> ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
> prealloc, prealloc, &alloc_hint);
> if (ret) {
> - btrfs_delalloc_release_space(inode, 0, prealloc);
> + btrfs_delalloc_release_metadata(inode, prealloc);
> goto out_put;
> }
> - btrfs_free_reserved_data_space(inode, 0, prealloc);
>
> ret = btrfs_write_out_ino_cache(root, trans, path, inode);
> out_put:
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4421954..4dc7c838 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -10269,6 +10269,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
> u64 last_alloc = (u64)-1;
> int ret = 0;
> bool own_trans = true;
> + u64 end = start + num_bytes - 1;
>
> if (trans)
> own_trans = false;
> @@ -10347,6 +10348,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
> }
> free_extent_map(em);
> next:
> + btrfs_free_reserved_data_space(inode, cur_offset, ins.offset);
> num_bytes -= ins.offset;
> cur_offset += ins.offset;
> *alloc_hint = ins.objectid + ins.offset;
> @@ -10377,9 +10379,18 @@ next:
> if (own_trans)
> btrfs_end_transaction(trans, root);
> }
> +
> + if (cur_offset < end)
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + end - cur_offset + 1);
> return ret;
> }
>
> +/*
> + * __btrfs_prealloc_file_range() will call btrfs_free_reserved_data_space()
> + * internally for both sucessful and failed path, btrfs_prealloc_file_range()'s
> + * callers does not need to call btrfs_free_reserved_data_space() any more.
> + */
> int btrfs_prealloc_file_range(struct inode *inode, int mode,
> u64 start, u64 num_bytes, u64 min_size,
> loff_t actual_len, u64 *alloc_hint)
> @@ -10389,6 +10400,7 @@ int btrfs_prealloc_file_range(struct inode *inode, int mode,
> NULL);
> }
>
> +/* Please see same comments in btrfs_prealloc_file_range() */
> int btrfs_prealloc_file_range_trans(struct inode *inode,
> struct btrfs_trans_handle *trans, int mode,
> u64 start, u64 num_bytes, u64 min_size,
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index abc2f69..70756fd 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3031,6 +3031,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
> int nr = 0;
> int ret = 0;
> u64 prealloc_start, prealloc_end;
> + u64 cur_offset;
>
> BUG_ON(cluster->start != cluster->boundary[0]);
> inode_lock(inode);
> @@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
> if (ret)
> goto out;
>
> + cur_offset = start;
> while (nr < cluster->nr) {
> prealloc_start = cluster->boundary[nr] - offset;
> if (nr + 1 < cluster->nr)
> @@ -3051,16 +3053,22 @@ int prealloc_file_extent_cluster(struct inode *inode,
> lock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
> prealloc_end);
> num_bytes = prealloc_end + 1 - prealloc_start;
> + if (cur_offset < start)
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + start - cur_offset);
> ret = btrfs_prealloc_file_range(inode, 0, prealloc_start,
> num_bytes, num_bytes,
> prealloc_end + 1, &alloc_hint);
> unlock_extent(&BTRFS_I(inode)->io_tree, prealloc_start,
> prealloc_end);
> + cur_offset = prealloc_end + 1;
> if (ret)
> break;
> nr++;
> }
> - btrfs_free_reserved_data_space(inode, start, end + 1 - start);
> + if (cur_offset < end)
> + btrfs_free_reserved_data_space(inode, cur_offset,
> + end + 1 - cur_offset);
> out:
> inode_unlock(inode);
> return ret;
next prev parent reply other threads:[~2016-07-11 11:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-06 10:37 [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-06 10:37 ` [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate() Wang Xiaoguang
2016-07-06 12:27 ` Holger Hoffstätte
2016-07-07 2:27 ` Wang Xiaoguang
2016-07-11 11:34 ` Wang Xiaoguang [this message]
2016-07-06 19:54 ` [PATCH 1/2] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Liu Bo
2016-07-07 2:26 ` Wang Xiaoguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5783843A.5080809@cn.fujitsu.com \
--to=wangxg.fnst@cn.fujitsu.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.