linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
To: <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 2/2] btrfs: fix false enospc for compression
Date: Tue, 1 Nov 2016 18:28:19 +0800	[thread overview]
Message-ID: <58186E43.1030906@cn.fujitsu.com> (raw)
In-Reply-To: <20161101101842.12630-2-wangxg.fnst@cn.fujitsu.com>

hi,

I have rebased these 2 patches against Linux 4.9-rc2, sorry for being late.
After applying these patches,  Stefan does not see any ENOSPC error :)
If you have free time, please have a check, thanks.

Regards,
Xiaoguang Wang

On 11/01/2016 06:18 PM, Wang Xiaoguang wrote:
> When testing btrfs compression, sometimes we got ENOSPC error, though fs
> still has much free space, xfstests generic/171, generic/172, generic/173,
> generic/174, generic/175 can reveal this bug in my test environment when
> compression is enabled.
>
> After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
> which sometimes tries to reserve plenty of metadata space, even for very small
> data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
> we try to reserve is calculated by the difference between outstanding_extents
> and reserved_extents. Please see below case for how ENOSPC occurs:
>
>    1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
> outstanding extents be 1, and reserved_extents be 1024. Note it's
> btrfs_merge_extent_hook() that merges these 128KB units into one big
> outstanding extent, but do not change reserved_extents.
>
>    2, When writing dirty pages, for compression, cow_file_range_async() will
> split above big extent in unit of 128KB(compression extent size is 128KB).
> When first split opeartion finishes, we'll have 2 outstanding extents and 1024
> reserved extents, and just right now the currently generated ordered extent is
> dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
> btrfs_finish_ordered_io()) will be called to release metadata, after that we
> will have 1 outstanding extents and 1 reserved extents(also see logic in
> drop_outstanding_extent()). Later cow_file_range_async() continues to handles
> left data range[128KB, 128MB), and if no other ordered extent was dispatched
> to run, there will be 1023 outstanding extents and 1 reserved extent.
>
>    3, Now if another bufferd write for this file enters, then
> btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
> for 1023 outstanding extents' metadata, for 16KB node size, it'll be 1023*16384*2*8,
> about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, so
> obviously it's not sane and can easily result in enospc error.
>
> The root cause is that for compression, its max extent size will no longer be
> BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
> method in btrfs is not appropriate or correct, here we introduce:
> 	enum btrfs_metadata_reserve_type {
>          	BTRFS_RESERVE_NORMAL,
>          	BTRFS_RESERVE_COMPRESS,
> 	};
> and expand btrfs_delalloc_reserve_metadata() and btrfs_delalloc_reserve_space()
> by adding a new enum btrfs_metadata_reserve_type argument. When a data range will
> go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
> Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
> through compression path.
>
> With this patch, we can fix these false enospc error for compression.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
> Tested-by: Stefan Priebe <s.priebe@profihost.ag>
> ---
>   fs/btrfs/ctree.h             |  31 ++++++--
>   fs/btrfs/extent-tree.c       |  55 +++++++++----
>   fs/btrfs/extent_io.c         |  59 +++++++++++++-
>   fs/btrfs/extent_io.h         |   2 +
>   fs/btrfs/file.c              |  28 +++++--
>   fs/btrfs/free-space-cache.c  |   6 +-
>   fs/btrfs/inode-map.c         |   5 +-
>   fs/btrfs/inode.c             | 185 ++++++++++++++++++++++++++++++++-----------
>   fs/btrfs/ioctl.c             |  12 ++-
>   fs/btrfs/relocation.c        |  14 +++-
>   fs/btrfs/tests/inode-tests.c |  15 ++--
>   11 files changed, 315 insertions(+), 97 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 52a8535..457b106 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -98,6 +98,19 @@ static const int btrfs_csum_sizes[] = { 4 };
>   
>   #define BTRFS_DIRTY_METADATA_THRESH	SZ_32M
>   
> +/*
> + * for compression, max file extent size would be limited to 128K, so when
> + * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS to
> + * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to
> + * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL.
> + */
> +enum btrfs_metadata_reserve_type {
> +	BTRFS_RESERVE_NORMAL,
> +	BTRFS_RESERVE_COMPRESS,
> +};
> +int inode_need_compress(struct inode *inode);
> +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
> +
>   #define BTRFS_MAX_EXTENT_SIZE SZ_128M
>   
>   struct btrfs_mapping_tree {
> @@ -2693,10 +2706,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
>   void btrfs_subvolume_release_metadata(struct btrfs_root *root,
>   				      struct btrfs_block_rsv *rsv,
>   				      u64 qgroup_reserved);
> -int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
> -void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
> -int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
> -void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
> +int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes,
> +		enum btrfs_metadata_reserve_type reserve_type);
> +void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes,
> +		enum btrfs_metadata_reserve_type reserve_type);
> +int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len,
> +		enum btrfs_metadata_reserve_type reserve_type);
> +void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len,
> +		enum btrfs_metadata_reserve_type reserve_type);
>   void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
>   struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
>   					      unsigned short type);
> @@ -3138,9 +3155,9 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput);
>   int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput,
>   			       int nr);
>   int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
> -			      struct extent_state **cached_state, int dedupe);
> +			      struct extent_state **cached_state, int flag);
>   int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> -			    struct extent_state **cached_state);
> +			    struct extent_state **cached_state, int flag);
>   int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
>   			     struct btrfs_root *new_root,
>   			     struct btrfs_root *parent_root,
> @@ -3232,7 +3249,7 @@ int btrfs_release_file(struct inode *inode, struct file *file);
>   int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
>   		      struct page **pages, size_t num_pages,
>   		      loff_t pos, size_t write_bytes,
> -		      struct extent_state **cached);
> +		      struct extent_state **cached, int flag);
>   int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>   ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
>   			      struct file *file_out, loff_t pos_out,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 210c94a..9aa6d2c 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5838,15 +5838,16 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root,
>    * reserved extents that need to be freed.  This must be called with
>    * BTRFS_I(inode)->lock held.
>    */
> -static unsigned drop_outstanding_extent(struct inode *inode, u64 num_bytes)
> +static unsigned drop_outstanding_extent(struct inode *inode, u64 num_bytes,
> +			enum btrfs_metadata_reserve_type reserve_type)
>   {
>   	unsigned drop_inode_space = 0;
>   	unsigned dropped_extents = 0;
>   	unsigned num_extents = 0;
> +	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
>   
> -	num_extents = (unsigned)div64_u64(num_bytes +
> -					  BTRFS_MAX_EXTENT_SIZE - 1,
> -					  BTRFS_MAX_EXTENT_SIZE);
> +	num_extents = (unsigned)div64_u64(num_bytes + max_extent_size - 1,
> +					  max_extent_size);
>   	ASSERT(num_extents);
>   	ASSERT(BTRFS_I(inode)->outstanding_extents >= num_extents);
>   	BTRFS_I(inode)->outstanding_extents -= num_extents;
> @@ -5916,7 +5917,21 @@ static u64 calc_csum_metadata_size(struct inode *inode, u64 num_bytes,
>   	return btrfs_calc_trans_metadata_size(root, old_csums - num_csums);
>   }
>   
> -int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
> +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type)
> +{
> +	if (reserve_type == BTRFS_RESERVE_COMPRESS)
> +		return SZ_128K;
> +
> +	return BTRFS_MAX_EXTENT_SIZE;
> +}
> +
> +/*
> + * @reserve_type: normally reserve_type should be BTRFS_RESERVE_NORMAL, but for
> + * compression path, its max extent size is limited to 128KB, not 128MB, when
> + * reserving metadata, we should set reserve_type to BTRFS_RESERVE_COMPRESS.
> + */
> +int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes,
> +		enum btrfs_metadata_reserve_type reserve_type)
>   {
>   	struct btrfs_root *root = BTRFS_I(inode)->root;
>   	struct btrfs_block_rsv *block_rsv = &root->fs_info->delalloc_block_rsv;
> @@ -5929,6 +5944,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
>   	u64 to_free = 0;
>   	unsigned dropped;
>   	bool release_extra = false;
> +	u64 max_extent_size = btrfs_max_extent_size(reserve_type);
>   
>   	/* If we are a free space inode we need to not flush since we will be in
>   	 * the middle of a transaction commit.  We also don't need the delalloc
> @@ -5955,9 +5971,8 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
>   	num_bytes = ALIGN(num_bytes, root->sectorsize);
>   
>   	spin_lock(&BTRFS_I(inode)->lock);
> -	nr_extents = (unsigned)div64_u64(num_bytes +
> -					 BTRFS_MAX_EXTENT_SIZE - 1,
> -					 BTRFS_MAX_EXTENT_SIZE);
> +	nr_extents = (unsigned)div64_u64(num_bytes + max_extent_size - 1,
> +					 max_extent_size);
>   	BTRFS_I(inode)->outstanding_extents += nr_extents;
>   
>   	nr_extents = 0;
> @@ -6008,7 +6023,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
>   
>   out_fail:
>   	spin_lock(&BTRFS_I(inode)->lock);
> -	dropped = drop_outstanding_extent(inode, num_bytes);
> +	dropped = drop_outstanding_extent(inode, num_bytes, reserve_type);
>   	/*
>   	 * If the inodes csum_bytes is the same as the original
>   	 * csum_bytes then we know we haven't raced with any free()ers
> @@ -6074,12 +6089,15 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
>    * btrfs_delalloc_release_metadata - release a metadata reservation for an inode
>    * @inode: the inode to release the reservation for
>    * @num_bytes: the number of bytes we're releasing
> + * @reserve_type: this value must be same to the value passing to
> + * btrfs_delalloc_reserve_metadata().
>    *
>    * This will release the metadata reservation for an inode.  This can be called
>    * once we complete IO for a given set of bytes to release their metadata
>    * reservations.
>    */
> -void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
> +void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes,
> +		enum btrfs_metadata_reserve_type reserve_type)
>   {
>   	struct btrfs_root *root = BTRFS_I(inode)->root;
>   	u64 to_free = 0;
> @@ -6087,7 +6105,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
>   
>   	num_bytes = ALIGN(num_bytes, root->sectorsize);
>   	spin_lock(&BTRFS_I(inode)->lock);
> -	dropped = drop_outstanding_extent(inode, num_bytes);
> +	dropped = drop_outstanding_extent(inode, num_bytes, reserve_type);
>   
>   	if (num_bytes)
>   		to_free = calc_csum_metadata_size(inode, num_bytes, 0);
> @@ -6111,6 +6129,9 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
>    * @inode: inode we're writing to
>    * @start: start range we are writing to
>    * @len: how long the range we are writing to
> + * @reserve_type: normally reserve_type should be BTRFS_RESERVE_NORMAL, but for
> + * compression path, its max extent size is limited to 128KB, not 128MB, when
> + * reserving metadata, we should set reserve_type to BTRFS_RESERVE_COMPRESS.
>    *
>    * This will do the following things
>    *
> @@ -6128,14 +6149,15 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
>    * Return 0 for success
>    * Return <0 for error(-ENOSPC or -EQUOT)
>    */
> -int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
> +int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len,
> +		enum btrfs_metadata_reserve_type reserve_type)
>   {
>   	int ret;
>   
>   	ret = btrfs_check_data_free_space(inode, start, len);
>   	if (ret < 0)
>   		return ret;
> -	ret = btrfs_delalloc_reserve_metadata(inode, len);
> +	ret = btrfs_delalloc_reserve_metadata(inode, len, reserve_type);
>   	if (ret < 0)
>   		btrfs_free_reserved_data_space(inode, start, len);
>   	return ret;
> @@ -6146,6 +6168,8 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
>    * @inode: inode we're releasing space for
>    * @start: start position of the space already reserved
>    * @len: the len of the space already reserved
> + * @reserve_type: this value must be same to the value passing to
> + * btrfs_delalloc_reserve_space().
>    *
>    * This must be matched with a call to btrfs_delalloc_reserve_space.  This is
>    * called in the case that we don't need the metadata AND data reservations
> @@ -6156,9 +6180,10 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
>    * list if there are no delalloc bytes left.
>    * Also it will handle the qgroup reserved space.
>    */
> -void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
> +void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len,
> +		enum btrfs_metadata_reserve_type reserve_type)
>   {
> -	btrfs_delalloc_release_metadata(inode, len);
> +	btrfs_delalloc_release_metadata(inode, len, reserve_type);
>   	btrfs_free_reserved_data_space(inode, start, len);
>   }
>   
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 66a7551..8c07306 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -603,7 +603,7 @@ static int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>   	btrfs_debug_check_extent_io_range(tree, start, end);
>   
>   	if (bits & EXTENT_DELALLOC)
> -		bits |= EXTENT_NORESERVE;
> +		bits |= EXTENT_NORESERVE | EXTENT_COMPRESS;
>   
>   	if (delete)
>   		bits |= ~EXTENT_CTLBITS;
> @@ -742,6 +742,58 @@ static int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>   
>   }
>   
> +static void adjust_one_outstanding_extent(struct inode *inode, u64 len)
> +{
> +	unsigned old_extents, new_extents;
> +
> +	old_extents = div64_u64(len + SZ_128K - 1, SZ_128K);
> +	new_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1,
> +				BTRFS_MAX_EXTENT_SIZE);
> +	if (old_extents <= new_extents)
> +		return;
> +
> +	spin_lock(&BTRFS_I(inode)->lock);
> +	BTRFS_I(inode)->outstanding_extents -= old_extents - new_extents;
> +	spin_unlock(&BTRFS_I(inode)->lock);
> +}
> +
> +/*
> + * For a extent with EXTENT_COMPRESS flag, if later it does not go through
> + * compress path, we need to adjust the number of outstanding_extents.
> + * It's because for extent with EXTENT_COMPRESS flag, its number of outstanding
> + * extents is calculated by 128KB, so here we need to adjust it.
> + */
> +void adjust_outstanding_extents(struct inode *inode,
> +				u64 start, u64 end)
> +{
> +	struct rb_node *node;
> +	struct extent_state *state;
> +	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
> +
> +	spin_lock(&tree->lock);
> +	node = tree_search(tree, start);
> +	if (!node)
> +		goto out;
> +
> +	while (1) {
> +		state = rb_entry(node, struct extent_state, rb_node);
> +		if (state->start > end)
> +			goto out;
> +		/*
> +		 * The whole range is locked, so we can safely clear
> +		 * EXTENT_COMPRESS flag.
> +		 */
> +		state->state &= ~EXTENT_COMPRESS;
> +		adjust_one_outstanding_extent(inode,
> +				state->end - state->start + 1);
> +		node = rb_next(node);
> +		if (!node)
> +			break;
> +	}
> +out:
> +	spin_unlock(&tree->lock);
> +}
> +
>   static void wait_on_state(struct extent_io_tree *tree,
>   			  struct extent_state *state)
>   		__releases(tree->lock)
> @@ -1504,6 +1556,7 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
>   	u64 cur_start = *start;
>   	u64 found = 0;
>   	u64 total_bytes = 0;
> +	unsigned pre_state;
>   
>   	spin_lock(&tree->lock);
>   
> @@ -1521,7 +1574,8 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
>   	while (1) {
>   		state = rb_entry(node, struct extent_state, rb_node);
>   		if (found && (state->start != cur_start ||
> -			      (state->state & EXTENT_BOUNDARY))) {
> +			      (state->state & EXTENT_BOUNDARY) ||
> +			      (state->state ^ pre_state) & EXTENT_COMPRESS)) {
>   			goto out;
>   		}
>   		if (!(state->state & EXTENT_DELALLOC)) {
> @@ -1537,6 +1591,7 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
>   		found++;
>   		*end = state->end;
>   		cur_start = state->end + 1;
> +		pre_state = state->state;
>   		node = rb_next(node);
>   		total_bytes += state->end - state->start + 1;
>   		if (total_bytes >= max_bytes)
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index ab31d14..d9f846c 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -21,6 +21,7 @@
>   #define EXTENT_NORESERVE	(1U << 15)
>   #define EXTENT_QGROUP_RESERVED	(1U << 16)
>   #define EXTENT_CLEAR_DATA_RESV	(1U << 17)
> +#define	EXTENT_COMPRESS		(1U << 18)
>   #define EXTENT_IOBITS		(EXTENT_LOCKED | EXTENT_WRITEBACK)
>   #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
>   
> @@ -247,6 +248,7 @@ int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
>   int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>   		     unsigned bits, int wake, int delete,
>   		     struct extent_state **cached, gfp_t mask);
> +void adjust_outstanding_extents(struct inode *inode, u64 start, u64 end);
>   
>   static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end)
>   {
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 3a14c87..e4c998d 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -484,11 +484,15 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
>    *
>    * this also makes the decision about creating an inline extent vs
>    * doing real data extents, marking pages dirty and delalloc as required.
> + *
> + * normally flag should be 0, but if a data range will go through compress path,
> + * set flag to 1. Note: here we should ensure enum btrfs_metadata_reserve_type
> + * and flag's values are consistent.
>    */
>   int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
>   			     struct page **pages, size_t num_pages,
>   			     loff_t pos, size_t write_bytes,
> -			     struct extent_state **cached)
> +			     struct extent_state **cached, int flag)
>   {
>   	int err = 0;
>   	int i;
> @@ -503,7 +507,7 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
>   
>   	end_of_last_block = start_pos + num_bytes - 1;
>   	err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
> -					cached, 0);
> +					cached, flag);
>   	if (err)
>   		return err;
>   
> @@ -1521,6 +1525,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>   	bool only_release_metadata = false;
>   	bool force_page_uptodate = false;
>   	bool need_unlock;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
>   			PAGE_SIZE / (sizeof(struct page *)));
> @@ -1530,6 +1535,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>   	if (!pages)
>   		return -ENOMEM;
>   
> +	if (inode_need_compress(inode))
> +		reserve_type = BTRFS_RESERVE_COMPRESS;
> +
>   	while (iov_iter_count(i) > 0) {
>   		size_t offset = pos & (PAGE_SIZE - 1);
>   		size_t sector_offset;
> @@ -1583,7 +1591,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>   			}
>   		}
>   
> -		ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes);
> +		ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes,
> +						      reserve_type);
>   		if (ret) {
>   			if (!only_release_metadata)
>   				btrfs_free_reserved_data_space(inode, pos,
> @@ -1666,14 +1675,16 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>   			}
>   			if (only_release_metadata) {
>   				btrfs_delalloc_release_metadata(inode,
> -								release_bytes);
> +								release_bytes,
> +								reserve_type);
>   			} else {
>   				u64 __pos;
>   
>   				__pos = round_down(pos, root->sectorsize) +
>   					(dirty_pages << PAGE_SHIFT);
>   				btrfs_delalloc_release_space(inode, __pos,
> -							     release_bytes);
> +							     release_bytes,
> +							     reserve_type);
>   			}
>   		}
>   
> @@ -1683,7 +1694,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>   		if (copied > 0)
>   			ret = btrfs_dirty_pages(root, inode, pages,
>   						dirty_pages, pos, copied,
> -						NULL);
> +						NULL, reserve_type);
>   		if (need_unlock)
>   			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
>   					     lockstart, lockend, &cached_state,
> @@ -1724,11 +1735,12 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>   	if (release_bytes) {
>   		if (only_release_metadata) {
>   			btrfs_end_write_no_snapshoting(root);
> -			btrfs_delalloc_release_metadata(inode, release_bytes);
> +			btrfs_delalloc_release_metadata(inode, release_bytes,
> +							reserve_type);
>   		} else {
>   			btrfs_delalloc_release_space(inode,
>   						round_down(pos, root->sectorsize),
> -						release_bytes);
> +						release_bytes, reserve_type);
>   		}
>   	}
>   
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index e4b48f3..e8947c2 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1297,7 +1297,7 @@ static int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
>   
>   	/* Everything is written out, now we dirty the pages in the file. */
>   	ret = btrfs_dirty_pages(root, inode, io_ctl->pages, io_ctl->num_pages,
> -				0, i_size_read(inode), &cached_state);
> +				0, i_size_read(inode), &cached_state, 0);
>   	if (ret)
>   		goto out_nospc;
>   
> @@ -3516,6 +3516,7 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
>   	int ret;
>   	struct btrfs_io_ctl io_ctl;
>   	bool release_metadata = true;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	if (!btrfs_test_opt(root->fs_info, INODE_MAP_CACHE))
>   		return 0;
> @@ -3536,7 +3537,8 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
>   
>   	if (ret) {
>   		if (release_metadata)
> -			btrfs_delalloc_release_metadata(inode, inode->i_size);
> +			btrfs_delalloc_release_metadata(inode, inode->i_size,
> +							reserve_type);
>   #ifdef DEBUG
>   		btrfs_err(root->fs_info,
>   			"failed to write free ino cache for root %llu",
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index d27014b..39c4b1c 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -404,6 +404,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>   	int ret;
>   	int prealloc;
>   	bool retry = false;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	/* only fs tree and subvol/snap needs ino cache */
>   	if (root->root_key.objectid != BTRFS_FS_TREE_OBJECTID &&
> @@ -491,14 +492,14 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>   	/* Just to make sure we have enough space */
>   	prealloc += 8 * PAGE_SIZE;
>   
> -	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
> +	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc, reserve_type);
>   	if (ret)
>   		goto out_put;
>   
>   	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
>   					      prealloc, prealloc, &alloc_hint);
>   	if (ret) {
> -		btrfs_delalloc_release_metadata(inode, prealloc);
> +		btrfs_delalloc_release_metadata(inode, prealloc, reserve_type);
>   		goto out_put;
>   	}
>   
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e427b81..608fd84 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -315,7 +315,7 @@ static noinline int cow_file_range_inline(struct btrfs_root *root,
>   	}
>   
>   	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
> -	btrfs_delalloc_release_metadata(inode, end + 1 - start);
> +	btrfs_delalloc_release_metadata(inode, end + 1 - start, 0);
>   	btrfs_drop_extent_cache(inode, start, aligned_end - 1, 0);
>   out:
>   	/*
> @@ -371,7 +371,7 @@ static noinline int add_async_extent(struct async_cow *cow,
>   	return 0;
>   }
>   
> -static inline int inode_need_compress(struct inode *inode)
> +int inode_need_compress(struct inode *inode)
>   {
>   	struct btrfs_root *root = BTRFS_I(inode)->root;
>   
> @@ -710,6 +710,16 @@ static noinline void submit_compressed_extents(struct inode *inode,
>   					 async_extent->start +
>   					 async_extent->ram_size - 1);
>   
> +			/*
> +			 * We use 128KB as max extent size to calculate number
> +			 * of outstanding extents for this extent before, now
> +			 * it'll go throuth uncompressed IO, we need to use
> +			 * 128MB as max extent size to re-calculate number of
> +			 * outstanding extents for this extent.
> +			 */
> +			adjust_outstanding_extents(inode, async_extent->start,
> +						   async_extent->start +
> +						   async_extent->ram_size - 1);
>   			/* allocate blocks */
>   			ret = cow_file_range(inode, async_cow->locked_page,
>   					     async_extent->start,
> @@ -1573,14 +1583,24 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page,
>   {
>   	int ret;
>   	int force_cow = need_force_cow(inode, start, end);
> +	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> +	int need_compress;
>   
> +	need_compress = test_range_bit(io_tree, start, end,
> +				       EXTENT_COMPRESS, 1, NULL);
>   	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) {
> +		if (need_compress)
> +			adjust_outstanding_extents(inode, start, end);
> +
>   		ret = run_delalloc_nocow(inode, locked_page, start, end,
>   					 page_started, 1, nr_written);
>   	} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
> +		if (need_compress)
> +			adjust_outstanding_extents(inode, start, end);
> +
>   		ret = run_delalloc_nocow(inode, locked_page, start, end,
>   					 page_started, 0, nr_written);
> -	} else if (!inode_need_compress(inode)) {
> +	} else if (!need_compress) {
>   		ret = cow_file_range(inode, locked_page, start, end, end,
>   				      page_started, nr_written, 1, NULL);
>   	} else {
> @@ -1596,6 +1616,7 @@ static void btrfs_split_extent_hook(struct inode *inode,
>   				    struct extent_state *orig, u64 split)
>   {
>   	u64 size;
> +	u64 max_extent_size = BTRFS_MAX_EXTENT_SIZE;
>   
>   	/* not delalloc, ignore it */
>   	if (!(orig->state & EXTENT_DELALLOC))
> @@ -1604,8 +1625,11 @@ static void btrfs_split_extent_hook(struct inode *inode,
>   	if (btrfs_is_free_space_inode(inode))
>   		return;
>   
> +	if (orig->state & EXTENT_COMPRESS)
> +		max_extent_size = SZ_128K;
> +
>   	size = orig->end - orig->start + 1;
> -	if (size > BTRFS_MAX_EXTENT_SIZE) {
> +	if (size > max_extent_size) {
>   		u64 num_extents;
>   		u64 new_size;
>   
> @@ -1614,13 +1638,13 @@ static void btrfs_split_extent_hook(struct inode *inode,
>   		 * applies here, just in reverse.
>   		 */
>   		new_size = orig->end - split + 1;
> -		num_extents = div64_u64(new_size + BTRFS_MAX_EXTENT_SIZE - 1,
> -					BTRFS_MAX_EXTENT_SIZE);
> +		num_extents = div64_u64(new_size + max_extent_size - 1,
> +					max_extent_size);
>   		new_size = split - orig->start;
> -		num_extents += div64_u64(new_size + BTRFS_MAX_EXTENT_SIZE - 1,
> -					BTRFS_MAX_EXTENT_SIZE);
> -		if (div64_u64(size + BTRFS_MAX_EXTENT_SIZE - 1,
> -			      BTRFS_MAX_EXTENT_SIZE) >= num_extents)
> +		num_extents += div64_u64(new_size + max_extent_size - 1,
> +					 max_extent_size);
> +		if (div64_u64(size + max_extent_size - 1,
> +			      max_extent_size) >= num_extents)
>   			return;
>   	}
>   
> @@ -1641,6 +1665,7 @@ static void btrfs_merge_extent_hook(struct inode *inode,
>   {
>   	u64 new_size, old_size;
>   	u64 num_extents;
> +	u64 max_extent_size = BTRFS_MAX_EXTENT_SIZE;
>   
>   	/* not delalloc, ignore it */
>   	if (!(other->state & EXTENT_DELALLOC))
> @@ -1649,13 +1674,16 @@ static void btrfs_merge_extent_hook(struct inode *inode,
>   	if (btrfs_is_free_space_inode(inode))
>   		return;
>   
> +	if (other->state & EXTENT_COMPRESS)
> +		max_extent_size = SZ_128K;
> +
>   	if (new->start > other->start)
>   		new_size = new->end - other->start + 1;
>   	else
>   		new_size = other->end - new->start + 1;
>   
>   	/* we're not bigger than the max, unreserve the space and go */
> -	if (new_size <= BTRFS_MAX_EXTENT_SIZE) {
> +	if (new_size <= max_extent_size) {
>   		spin_lock(&BTRFS_I(inode)->lock);
>   		BTRFS_I(inode)->outstanding_extents--;
>   		spin_unlock(&BTRFS_I(inode)->lock);
> @@ -1681,14 +1709,14 @@ static void btrfs_merge_extent_hook(struct inode *inode,
>   	 * this case.
>   	 */
>   	old_size = other->end - other->start + 1;
> -	num_extents = div64_u64(old_size + BTRFS_MAX_EXTENT_SIZE - 1,
> -				BTRFS_MAX_EXTENT_SIZE);
> +	num_extents = div64_u64(old_size + max_extent_size - 1,
> +				max_extent_size);
>   	old_size = new->end - new->start + 1;
> -	num_extents += div64_u64(old_size + BTRFS_MAX_EXTENT_SIZE - 1,
> -				 BTRFS_MAX_EXTENT_SIZE);
> +	num_extents += div64_u64(old_size + max_extent_size - 1,
> +				 max_extent_size);
>   
> -	if (div64_u64(new_size + BTRFS_MAX_EXTENT_SIZE - 1,
> -		      BTRFS_MAX_EXTENT_SIZE) >= num_extents)
> +	if (div64_u64(new_size + max_extent_size - 1,
> +		      max_extent_size) >= num_extents)
>   		return;
>   
>   	spin_lock(&BTRFS_I(inode)->lock);
> @@ -1754,10 +1782,15 @@ static void btrfs_set_bit_hook(struct inode *inode,
>   	if (!(state->state & EXTENT_DELALLOC) && (*bits & EXTENT_DELALLOC)) {
>   		struct btrfs_root *root = BTRFS_I(inode)->root;
>   		u64 len = state->end + 1 - state->start;
> -		u64 num_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE - 1,
> -					    BTRFS_MAX_EXTENT_SIZE);
> +		u64 max_extent_size = BTRFS_MAX_EXTENT_SIZE;
> +		u64 num_extents;
>   		bool do_list = !btrfs_is_free_space_inode(inode);
>   
> +		if (*bits & EXTENT_COMPRESS)
> +			max_extent_size = SZ_128K;
> +		num_extents = div64_u64(len + max_extent_size - 1,
> +					max_extent_size);
> +
>   		if (*bits & EXTENT_FIRST_DELALLOC)
>   			*bits &= ~EXTENT_FIRST_DELALLOC;
>   
> @@ -1792,8 +1825,9 @@ static void btrfs_clear_bit_hook(struct inode *inode,
>   				 unsigned *bits)
>   {
>   	u64 len = state->end + 1 - state->start;
> -	u64 num_extents = div64_u64(len + BTRFS_MAX_EXTENT_SIZE -1,
> -				    BTRFS_MAX_EXTENT_SIZE);
> +	u64 max_extent_size = BTRFS_MAX_EXTENT_SIZE;
> +	u64 num_extents;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	spin_lock(&BTRFS_I(inode)->lock);
>   	if ((state->state & EXTENT_DEFRAG) && (*bits & EXTENT_DEFRAG))
> @@ -1809,6 +1843,14 @@ static void btrfs_clear_bit_hook(struct inode *inode,
>   		struct btrfs_root *root = BTRFS_I(inode)->root;
>   		bool do_list = !btrfs_is_free_space_inode(inode);
>   
> +		if (state->state & EXTENT_COMPRESS) {
> +			max_extent_size = SZ_128K;
> +			reserve_type = BTRFS_RESERVE_COMPRESS;
> +		}
> +
> +		num_extents = div64_u64(len + max_extent_size - 1,
> +					max_extent_size);
> +
>   		if (*bits & EXTENT_FIRST_DELALLOC) {
>   			*bits &= ~EXTENT_FIRST_DELALLOC;
>   		} else if (!(*bits & EXTENT_DO_ACCOUNTING) && do_list) {
> @@ -1824,7 +1866,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
>   		 */
>   		if (*bits & EXTENT_DO_ACCOUNTING &&
>   		    root != root->fs_info->tree_root)
> -			btrfs_delalloc_release_metadata(inode, len);
> +			btrfs_delalloc_release_metadata(inode, len,
> +							reserve_type);
>   
>   		/* For sanity tests. */
>   		if (btrfs_is_testing(root->fs_info))
> @@ -2006,16 +2049,34 @@ static noinline int add_pending_csums(struct btrfs_trans_handle *trans,
>   	return 0;
>   }
>   
> +/*
> + * Normally flag should be 0, but if a data range will go through compress path,
> + * set flag to 1. Note: here we should ensure enum btrfs_metadata_reserve_type
> + * and flag's values are consistent.
> + */
>   int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
> -			      struct extent_state **cached_state, int dedupe)
> +			      struct extent_state **cached_state, int flag)
>   {
>   	int ret;
> -	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
> -				    BTRFS_MAX_EXTENT_SIZE);
> +	unsigned bits;
> +	u64 max_extent_size = BTRFS_MAX_EXTENT_SIZE;
> +	u64 num_extents;
> +
> +	if (flag == 1)
> +		max_extent_size = SZ_128K;
> +
> +	num_extents = div64_u64(end - start + max_extent_size,
> +				    max_extent_size);
> +
> +	/* compression path */
> +	if (flag == 1)
> +		bits = EXTENT_DELALLOC | EXTENT_COMPRESS | EXTENT_UPTODATE;
> +	else
> +		bits = EXTENT_DELALLOC | EXTENT_UPTODATE;
>   
>   	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
> -	ret = set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end,
> -				  cached_state);
> +	ret = set_extent_bit(&BTRFS_I(inode)->io_tree, start, end,
> +			     bits, NULL, cached_state, GFP_NOFS);
>   
>   	/*
>   	 * btrfs_delalloc_reserve_metadata() will first add number of
> @@ -2038,16 +2099,28 @@ int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
>   }
>   
>   int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> -			    struct extent_state **cached_state)
> +			    struct extent_state **cached_state, int flag)
>   {
>   	int ret;
> -	u64 num_extents = div64_u64(end - start + BTRFS_MAX_EXTENT_SIZE,
> -				    BTRFS_MAX_EXTENT_SIZE);
> +	u64 max_extent_size = BTRFS_MAX_EXTENT_SIZE;
> +	u64 num_extents;
> +	unsigned bits;
> +
> +	if (flag == 1)
> +		max_extent_size = SZ_128K;
> +
> +	num_extents = div64_u64(end - start + max_extent_size,
> +			    max_extent_size);
>   
>   	WARN_ON((end & (PAGE_SIZE - 1)) == 0);
> -	ret = set_extent_defrag(&BTRFS_I(inode)->io_tree, start, end,
> -				cached_state);
> +	if (flag == 1)
> +		bits = EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG |
> +				EXTENT_COMPRESS;
> +	else
> +		bits = EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG;
>   
> +	ret = set_extent_bit(&BTRFS_I(inode)->io_tree, start, end,
> +			     bits, NULL, cached_state, GFP_NOFS);
>   	if (ret == 0 && !btrfs_is_free_space_inode(inode)) {
>   		spin_lock(&BTRFS_I(inode)->lock);
>   		BTRFS_I(inode)->outstanding_extents -= num_extents;
> @@ -2073,6 +2146,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>   	u64 page_start;
>   	u64 page_end;
>   	int ret;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	fixup = container_of(work, struct btrfs_writepage_fixup, work);
>   	page = fixup->page;
> @@ -2105,8 +2179,10 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>   		goto again;
>   	}
>   
> +	if (inode_need_compress(inode))
> +		reserve_type = BTRFS_RESERVE_COMPRESS;
>   	ret = btrfs_delalloc_reserve_space(inode, page_start,
> -					   PAGE_SIZE);
> +					   PAGE_SIZE, reserve_type);
>   	if (ret) {
>   		mapping_set_error(page->mapping, ret);
>   		end_extent_writepage(page, ret, page_start, page_end);
> @@ -2115,7 +2191,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>   	 }
>   
>   	btrfs_set_extent_delalloc(inode, page_start, page_end, &cached_state,
> -				  0);
> +				  reserve_type);
>   	ClearPageChecked(page);
>   	set_page_dirty(page);
>   out:
> @@ -2925,6 +3001,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
>   	u64 logical_len = ordered_extent->len;
>   	bool nolock;
>   	bool truncated = false;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	nolock = btrfs_is_free_space_inode(inode);
>   
> @@ -3002,8 +3079,11 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
>   
>   	trans->block_rsv = &root->fs_info->delalloc_block_rsv;
>   
> -	if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered_extent->flags))
> +	if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered_extent->flags)) {
>   		compress_type = ordered_extent->compress_type;
> +		reserve_type = BTRFS_RESERVE_COMPRESS;
> +	}
> +
>   	if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
>   		BUG_ON(compress_type);
>   		ret = btrfs_mark_extent_written(trans, inode,
> @@ -3048,7 +3128,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
>   			     ordered_extent->len - 1, &cached_state, GFP_NOFS);
>   out:
>   	if (root != root->fs_info->tree_root)
> -		btrfs_delalloc_release_metadata(inode, ordered_extent->len);
> +		btrfs_delalloc_release_metadata(inode, ordered_extent->len,
> +						reserve_type);
>   	if (trans)
>   		btrfs_end_transaction(trans, root);
>   
> @@ -4762,13 +4843,17 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>   	int ret = 0;
>   	u64 block_start;
>   	u64 block_end;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
> +
> +	if (inode_need_compress(inode))
> +		reserve_type = BTRFS_RESERVE_COMPRESS;
>   
>   	if ((offset & (blocksize - 1)) == 0 &&
>   	    (!len || ((len & (blocksize - 1)) == 0)))
>   		goto out;
>   
>   	ret = btrfs_delalloc_reserve_space(inode,
> -			round_down(from, blocksize), blocksize);
> +			round_down(from, blocksize), blocksize, reserve_type);
>   	if (ret)
>   		goto out;
>   
> @@ -4777,7 +4862,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>   	if (!page) {
>   		btrfs_delalloc_release_space(inode,
>   				round_down(from, blocksize),
> -				blocksize);
> +				blocksize, reserve_type);
>   		ret = -ENOMEM;
>   		goto out;
>   	}
> @@ -4820,7 +4905,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>   			  0, 0, &cached_state, GFP_NOFS);
>   
>   	ret = btrfs_set_extent_delalloc(inode, block_start, block_end,
> -					&cached_state, 0);
> +					&cached_state, reserve_type);
>   	if (ret) {
>   		unlock_extent_cached(io_tree, block_start, block_end,
>   				     &cached_state, GFP_NOFS);
> @@ -4848,7 +4933,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>   out_unlock:
>   	if (ret)
>   		btrfs_delalloc_release_space(inode, block_start,
> -					     blocksize);
> +					     blocksize, reserve_type);
>   	unlock_page(page);
>   	put_page(page);
>   out:
> @@ -8744,7 +8829,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>   			inode_unlock(inode);
>   			relock = true;
>   		}
> -		ret = btrfs_delalloc_reserve_space(inode, offset, count);
> +		ret = btrfs_delalloc_reserve_space(inode, offset, count,
> +						   BTRFS_RESERVE_NORMAL);
>   		if (ret)
>   			goto out;
>   		dio_data.outstanding_extents = div64_u64(count +
> @@ -8776,7 +8862,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>   		if (ret < 0 && ret != -EIOCBQUEUED) {
>   			if (dio_data.reserve)
>   				btrfs_delalloc_release_space(inode, offset,
> -							     dio_data.reserve);
> +				     dio_data.reserve, BTRFS_RESERVE_NORMAL);
>   			/*
>   			 * On error we might have left some ordered extents
>   			 * without submitting corresponding bios for them, so
> @@ -8792,7 +8878,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>   					0);
>   		} else if (ret >= 0 && (size_t)ret < count)
>   			btrfs_delalloc_release_space(inode, offset,
> -						     count - (size_t)ret);
> +				     count - (size_t)ret, BTRFS_RESERVE_NORMAL);
>   	}
>   out:
>   	if (wakeup)
> @@ -9035,6 +9121,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>   	u64 page_start;
>   	u64 page_end;
>   	u64 end;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	reserved_space = PAGE_SIZE;
>   
> @@ -9043,6 +9130,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>   	page_end = page_start + PAGE_SIZE - 1;
>   	end = page_end;
>   
> +	if (inode_need_compress(inode))
> +		reserve_type = BTRFS_RESERVE_COMPRESS;
>   	/*
>   	 * Reserving delalloc space after obtaining the page lock can lead to
>   	 * deadlock. For example, if a dirty page is locked by this function
> @@ -9052,7 +9141,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>   	 * being processed by btrfs_page_mkwrite() function.
>   	 */
>   	ret = btrfs_delalloc_reserve_space(inode, page_start,
> -					   reserved_space);
> +					   reserved_space, reserve_type);
>   	if (!ret) {
>   		ret = file_update_time(vma->vm_file);
>   		reserved = 1;
> @@ -9104,7 +9193,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>   			BTRFS_I(inode)->outstanding_extents++;
>   			spin_unlock(&BTRFS_I(inode)->lock);
>   			btrfs_delalloc_release_space(inode, page_start,
> -						PAGE_SIZE - reserved_space);
> +						PAGE_SIZE - reserved_space,
> +						reserve_type);
>   		}
>   	}
>   
> @@ -9121,7 +9211,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>   			  0, 0, &cached_state, GFP_NOFS);
>   
>   	ret = btrfs_set_extent_delalloc(inode, page_start, end,
> -					&cached_state, 0);
> +					&cached_state, reserve_type);
>   	if (ret) {
>   		unlock_extent_cached(io_tree, page_start, page_end,
>   				     &cached_state, GFP_NOFS);
> @@ -9159,7 +9249,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>   	}
>   	unlock_page(page);
>   out:
> -	btrfs_delalloc_release_space(inode, page_start, reserved_space);
> +	btrfs_delalloc_release_space(inode, page_start, reserved_space,
> +				     reserve_type);
>   out_noreserve:
>   	sb_end_pagefault(inode->i_sb);
>   	return ret;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index fdf1901..91d121c 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1132,6 +1132,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>   	struct extent_state *cached_state = NULL;
>   	struct extent_io_tree *tree;
>   	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	file_end = (isize - 1) >> PAGE_SHIFT;
>   	if (!isize || start_index > file_end)
> @@ -1139,9 +1140,11 @@ static int cluster_pages_for_defrag(struct inode *inode,
>   
>   	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
>   
> +	if (inode_need_compress(inode))
> +		reserve_type = BTRFS_RESERVE_COMPRESS;
>   	ret = btrfs_delalloc_reserve_space(inode,
>   			start_index << PAGE_SHIFT,
> -			page_cnt << PAGE_SHIFT);
> +			page_cnt << PAGE_SHIFT, reserve_type);
>   	if (ret)
>   		return ret;
>   	i_done = 0;
> @@ -1232,11 +1235,12 @@ static int cluster_pages_for_defrag(struct inode *inode,
>   		spin_unlock(&BTRFS_I(inode)->lock);
>   		btrfs_delalloc_release_space(inode,
>   				start_index << PAGE_SHIFT,
> -				(page_cnt - i_done) << PAGE_SHIFT);
> +				(page_cnt - i_done) << PAGE_SHIFT,
> +				reserve_type);
>   	}
>   
>   	btrfs_set_extent_defrag(inode, page_start,
> -				page_end - 1, &cached_state);
> +				page_end - 1, &cached_state, reserve_type);
>   	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
>   			     page_start, page_end - 1, &cached_state,
>   			     GFP_NOFS);
> @@ -1257,7 +1261,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>   	}
>   	btrfs_delalloc_release_space(inode,
>   			start_index << PAGE_SHIFT,
> -			page_cnt << PAGE_SHIFT);
> +			page_cnt << PAGE_SHIFT, reserve_type);
>   	return ret;
>   
>   }
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0ec8ffa..db90b47 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3144,10 +3144,14 @@ static int relocate_file_extent_cluster(struct inode *inode,
>   	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
>   	int nr = 0;
>   	int ret = 0;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	if (!cluster->nr)
>   		return 0;
>   
> +	if (inode_need_compress(inode))
> +		reserve_type = BTRFS_RESERVE_COMPRESS;
> +
>   	ra = kzalloc(sizeof(*ra), GFP_NOFS);
>   	if (!ra)
>   		return -ENOMEM;
> @@ -3166,7 +3170,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
>   	index = (cluster->start - offset) >> PAGE_SHIFT;
>   	last_index = (cluster->end - offset) >> PAGE_SHIFT;
>   	while (index <= last_index) {
> -		ret = btrfs_delalloc_reserve_metadata(inode, PAGE_SIZE);
> +		ret = btrfs_delalloc_reserve_metadata(inode, PAGE_SIZE,
> +						      reserve_type);
>   		if (ret)
>   			goto out;
>   
> @@ -3179,7 +3184,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
>   						   mask);
>   			if (!page) {
>   				btrfs_delalloc_release_metadata(inode,
> -							PAGE_SIZE);
> +						PAGE_SIZE, reserve_type);
>   				ret = -ENOMEM;
>   				goto out;
>   			}
> @@ -3198,7 +3203,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
>   				unlock_page(page);
>   				put_page(page);
>   				btrfs_delalloc_release_metadata(inode,
> -							PAGE_SIZE);
> +						PAGE_SIZE, reserve_type);
>   				ret = -EIO;
>   				goto out;
>   			}
> @@ -3219,7 +3224,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
>   			nr++;
>   		}
>   
> -		btrfs_set_extent_delalloc(inode, page_start, page_end, NULL, 0);
> +		btrfs_set_extent_delalloc(inode, page_start, page_end, NULL,
> +					  reserve_type);
>   		set_page_dirty(page);
>   
>   		unlock_extent(&BTRFS_I(inode)->io_tree,
> diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
> index 0bf4680..9a1a01d 100644
> --- a/fs/btrfs/tests/inode-tests.c
> +++ b/fs/btrfs/tests/inode-tests.c
> @@ -943,6 +943,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
>   	struct inode *inode = NULL;
>   	struct btrfs_root *root = NULL;
>   	int ret = -ENOMEM;
> +	enum btrfs_metadata_reserve_type reserve_type = BTRFS_RESERVE_NORMAL;
>   
>   	inode = btrfs_new_test_inode();
>   	if (!inode) {
> @@ -968,7 +969,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
>   	/* [BTRFS_MAX_EXTENT_SIZE] */
>   	BTRFS_I(inode)->outstanding_extents++;
>   	ret = btrfs_set_extent_delalloc(inode, 0, BTRFS_MAX_EXTENT_SIZE - 1,
> -					NULL, 0);
> +					NULL, reserve_type);
>   	if (ret) {
>   		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
>   		goto out;
> @@ -984,7 +985,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
>   	BTRFS_I(inode)->outstanding_extents++;
>   	ret = btrfs_set_extent_delalloc(inode, BTRFS_MAX_EXTENT_SIZE,
>   					BTRFS_MAX_EXTENT_SIZE + sectorsize - 1,
> -					NULL, 0);
> +					NULL, reserve_type);
>   	if (ret) {
>   		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
>   		goto out;
> @@ -1019,7 +1020,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
>   	ret = btrfs_set_extent_delalloc(inode, BTRFS_MAX_EXTENT_SIZE >> 1,
>   					(BTRFS_MAX_EXTENT_SIZE >> 1)
>   					+ sectorsize - 1,
> -					NULL, 0);
> +					NULL, reserve_type);
>   	if (ret) {
>   		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
>   		goto out;
> @@ -1042,7 +1043,7 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
>   	ret = btrfs_set_extent_delalloc(inode,
>   			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize,
>   			(BTRFS_MAX_EXTENT_SIZE << 1) + 3 * sectorsize - 1,
> -			NULL, 0);
> +			NULL, reserve_type);
>   	if (ret) {
>   		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
>   		goto out;
> @@ -1060,7 +1061,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
>   	BTRFS_I(inode)->outstanding_extents++;
>   	ret = btrfs_set_extent_delalloc(inode,
>   			BTRFS_MAX_EXTENT_SIZE + sectorsize,
> -			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1, NULL, 0);
> +			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1,
> +			NULL, reserve_type);
>   	if (ret) {
>   		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
>   		goto out;
> @@ -1097,7 +1099,8 @@ static int test_extent_accounting(u32 sectorsize, u32 nodesize)
>   	BTRFS_I(inode)->outstanding_extents++;
>   	ret = btrfs_set_extent_delalloc(inode,
>   			BTRFS_MAX_EXTENT_SIZE + sectorsize,
> -			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1, NULL, 0);
> +			BTRFS_MAX_EXTENT_SIZE + 2 * sectorsize - 1,
> +			NULL, reserve_type);
>   	if (ret) {
>   		test_msg("btrfs_set_extent_delalloc returned %d\n", ret);
>   		goto out;




  reply	other threads:[~2016-11-01 10:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-01 10:18 [PATCH 1/2] btrfs: improve inode's outstanding_extents computation Wang Xiaoguang
2016-11-01 10:18 ` [PATCH 2/2] btrfs: fix false enospc for compression Wang Xiaoguang
2016-11-01 10:28   ` Wang Xiaoguang [this message]
  -- strict thread matches above, loose matches on Subject: below --
2016-10-06  2:51 [PATCH 1/2] btrfs: improve inode's outstanding_extents computation Wang Xiaoguang
2016-10-06  2:51 ` [PATCH 2/2] btrfs: fix false enospc for compression Wang Xiaoguang
2016-10-06  3:51   ` Wang Xiaoguang
2016-10-12  3:12   ` Wang Xiaoguang
2016-10-17 15:05     ` David Sterba
2016-10-14 13:09   ` Stefan Priebe - Profihost AG
2016-10-14 13:59   ` Holger Hoffstätte
2016-10-17  9:01     ` Wang Xiaoguang
2016-10-19 14:23       ` David Sterba
2016-10-25 10:43         ` Wang Xiaoguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58186E43.1030906@cn.fujitsu.com \
    --to=wangxg.fnst@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).