linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks
@ 2024-08-02 11:51 Zhang Yi
  2024-08-02 11:51 ` [PATCH v2 01/10] ext4: factor out ext4_map_create_blocks() to allocate new blocks Zhang Yi
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

Changes since v1:
 - Just rebase to v6.11-rc1.

This patch series is the part 3 prepartory changes of the buffered IO
iomap conversion, it simplify the counting and updating logic of
delalloc reserved blocks. I picked them out from my buffered IO iomap
conversion RFC series v4[1], and did some minor improvements of commit
messages. This series is based on 6.11-rc1, after it we could save a lot
of code.

Patch 1-3 simplify the delalloc extent management logic by changes to
always set EXT4_GET_BLOCKS_DELALLOC_RESERVE flag when allocating
preallocated blocks, and don't add EXTENT_STATUS_DELAYED flag to an
unwritten extent, which means ext4_es_is_delayed() is equal to
ext4_es_is_delonly().

Patch 4-6 simplify the reserved blocks updating logic by moves the
reserved blocks updating from ext4_{ind|ext}_map_blocks() to
ext4_es_insert_extent().

Patch 7-10 drop the unused code (e.g. ext4_es_is_delonly())and update
comments.

[1] https://lore.kernel.org/linux-ext4/20240410142948.2817554-1-yi.zhang@huaweicloud.com/

Thanks,
Yi.

Zhang Yi (10):
  ext4: factor out ext4_map_create_blocks() to allocate new blocks
  ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set
  ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks
  ext4: let __revise_pending() return newly inserted pendings
  ext4: count removed reserved blocks for delalloc only extent entry
  ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
  ext4: drop ext4_es_delayed_clu()
  ext4: use ext4_map_query_blocks() in ext4_map_blocks()
  ext4: drop ext4_es_is_delonly()
  ext4: drop all delonly descriptions

 fs/ext4/extents.c        |  37 ------
 fs/ext4/extents_status.c | 271 ++++++++++++++++-----------------------
 fs/ext4/extents_status.h |   7 -
 fs/ext4/indirect.c       |   7 -
 fs/ext4/inode.c          | 197 +++++++++++++---------------
 5 files changed, 195 insertions(+), 324 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/10] ext4: factor out ext4_map_create_blocks() to allocate new blocks
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-06 14:38   ` Jan Kara
  2024-08-02 11:51 ` [PATCH v2 02/10] ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set Zhang Yi
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Factor out a common helper ext4_map_create_blocks() from
ext4_map_blocks() to do a real blocks allocation, no logic changes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 157 +++++++++++++++++++++++++-----------------------
 1 file changed, 81 insertions(+), 76 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 941c1c0d5c6e..112aec171ee9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -482,6 +482,86 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
 	return retval;
 }
 
+static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
+				  struct ext4_map_blocks *map, int flags)
+{
+	struct extent_status es;
+	unsigned int status;
+	int err, retval = 0;
+
+	/*
+	 * Here we clear m_flags because after allocating an new extent,
+	 * it will be set again.
+	 */
+	map->m_flags &= ~EXT4_MAP_FLAGS;
+
+	/*
+	 * We need to check for EXT4 here because migrate could have
+	 * changed the inode type in between.
+	 */
+	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
+		retval = ext4_ext_map_blocks(handle, inode, map, flags);
+	} else {
+		retval = ext4_ind_map_blocks(handle, inode, map, flags);
+
+		/*
+		 * We allocated new blocks which will result in i_data's
+		 * format changing. Force the migrate to fail by clearing
+		 * migrate flags.
+		 */
+		if (retval > 0 && map->m_flags & EXT4_MAP_NEW)
+			ext4_clear_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
+	}
+	if (retval <= 0)
+		return retval;
+
+	if (unlikely(retval != map->m_len)) {
+		ext4_warning(inode->i_sb,
+			     "ES len assertion failed for inode %lu: "
+			     "retval %d != map->m_len %d",
+			     inode->i_ino, retval, map->m_len);
+		WARN_ON(1);
+	}
+
+	/*
+	 * We have to zeroout blocks before inserting them into extent
+	 * status tree. Otherwise someone could look them up there and
+	 * use them before they are really zeroed. We also have to
+	 * unmap metadata before zeroing as otherwise writeback can
+	 * overwrite zeros with stale data from block device.
+	 */
+	if (flags & EXT4_GET_BLOCKS_ZERO &&
+	    map->m_flags & EXT4_MAP_MAPPED && map->m_flags & EXT4_MAP_NEW) {
+		err = ext4_issue_zeroout(inode, map->m_lblk, map->m_pblk,
+					 map->m_len);
+		if (err)
+			return err;
+	}
+
+	/*
+	 * If the extent has been zeroed out, we don't need to update
+	 * extent status tree.
+	 */
+	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
+	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
+		if (ext4_es_is_written(&es))
+			return retval;
+	}
+
+	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
+			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
+	if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
+	    !(status & EXTENT_STATUS_WRITTEN) &&
+	    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
+			       map->m_lblk + map->m_len - 1))
+		status |= EXTENT_STATUS_DELAYED;
+
+	ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
+			      map->m_pblk, status);
+
+	return retval;
+}
+
 /*
  * The ext4_map_blocks() function tries to look up the requested blocks,
  * and returns if the blocks are already mapped.
@@ -630,12 +710,6 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 		if (!(flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN))
 			return retval;
 
-	/*
-	 * Here we clear m_flags because after allocating an new extent,
-	 * it will be set again.
-	 */
-	map->m_flags &= ~EXT4_MAP_FLAGS;
-
 	/*
 	 * New blocks allocate and/or writing to unwritten extent
 	 * will possibly result in updating i_data, so we take
@@ -643,76 +717,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	 * with create == 1 flag.
 	 */
 	down_write(&EXT4_I(inode)->i_data_sem);
-
-	/*
-	 * We need to check for EXT4 here because migrate
-	 * could have changed the inode type in between
-	 */
-	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
-		retval = ext4_ext_map_blocks(handle, inode, map, flags);
-	} else {
-		retval = ext4_ind_map_blocks(handle, inode, map, flags);
-
-		if (retval > 0 && map->m_flags & EXT4_MAP_NEW) {
-			/*
-			 * We allocated new blocks which will result in
-			 * i_data's format changing.  Force the migrate
-			 * to fail by clearing migrate flags
-			 */
-			ext4_clear_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
-		}
-	}
-
-	if (retval > 0) {
-		unsigned int status;
-
-		if (unlikely(retval != map->m_len)) {
-			ext4_warning(inode->i_sb,
-				     "ES len assertion failed for inode "
-				     "%lu: retval %d != map->m_len %d",
-				     inode->i_ino, retval, map->m_len);
-			WARN_ON(1);
-		}
-
-		/*
-		 * We have to zeroout blocks before inserting them into extent
-		 * status tree. Otherwise someone could look them up there and
-		 * use them before they are really zeroed. We also have to
-		 * unmap metadata before zeroing as otherwise writeback can
-		 * overwrite zeros with stale data from block device.
-		 */
-		if (flags & EXT4_GET_BLOCKS_ZERO &&
-		    map->m_flags & EXT4_MAP_MAPPED &&
-		    map->m_flags & EXT4_MAP_NEW) {
-			ret = ext4_issue_zeroout(inode, map->m_lblk,
-						 map->m_pblk, map->m_len);
-			if (ret) {
-				retval = ret;
-				goto out_sem;
-			}
-		}
-
-		/*
-		 * If the extent has been zeroed out, we don't need to update
-		 * extent status tree.
-		 */
-		if ((flags & EXT4_GET_BLOCKS_PRE_IO) &&
-		    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
-			if (ext4_es_is_written(&es))
-				goto out_sem;
-		}
-		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
-				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
-		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
-		    !(status & EXTENT_STATUS_WRITTEN) &&
-		    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
-				       map->m_lblk + map->m_len - 1))
-			status |= EXTENT_STATUS_DELAYED;
-		ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
-				      map->m_pblk, status);
-	}
-
-out_sem:
+	retval = ext4_map_create_blocks(handle, inode, map, flags);
 	up_write((&EXT4_I(inode)->i_data_sem));
 	if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
 		ret = check_block_validity(inode, map);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/10] ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
  2024-08-02 11:51 ` [PATCH v2 01/10] ext4: factor out ext4_map_create_blocks() to allocate new blocks Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-06 14:48   ` Jan Kara
  2024-08-02 11:51 ` [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks Zhang Yi
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

When doing block allocation, magic EXT4_GET_BLOCKS_DELALLOC_RESERVE
means the allocating range covers a range of delayed allocated clusters,
the blocks and quotas have already been reserved in ext4_da_map_blocks(),
we should update the reserved space and don't need to claim them again.

At the moment, we only set this magic in mpage_map_one_extent() when
allocating a range of delayed allocated clusters in the write back path,
it makes things complicated since we have to notice and deal with the
case of allocating non-delayed allocated clusters separately in
ext4_ext_map_blocks(). For example, it we fallocate some blocks that
have been delayed allocated, free space would be claimed again in
ext4_mb_new_blocks() (this is wrong exactily), and we can't claim quota
space again, we have to release the quota reservations made for that
previously delayed allocated clusters.

Move the position thats set the EXT4_GET_BLOCKS_DELALLOC_RESERVE to
where we actually do block allocation, it could simplify above handling
a lot, it means that we always set this magic once the allocation range
covers delalloc blocks, no need to take care of the allocation path.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 112aec171ee9..91b2610a6dc5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -489,6 +489,14 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
 	unsigned int status;
 	int err, retval = 0;
 
+	/*
+	 * We pass in the magic EXT4_GET_BLOCKS_DELALLOC_RESERVE
+	 * indicates that the blocks and quotas has already been
+	 * checked when the data was copied into the page cache.
+	 */
+	if (map->m_flags & EXT4_MAP_DELAYED)
+		flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;
+
 	/*
 	 * Here we clear m_flags because after allocating an new extent,
 	 * it will be set again.
@@ -2224,11 +2232,6 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
 	 * writeback and there is nothing we can do about it so it might result
 	 * in data loss.  So use reserved blocks to allocate metadata if
 	 * possible.
-	 *
-	 * We pass in the magic EXT4_GET_BLOCKS_DELALLOC_RESERVE if
-	 * the blocks in question are delalloc blocks.  This indicates
-	 * that the blocks and quotas has already been checked when
-	 * the data was copied into the page cache.
 	 */
 	get_blocks_flags = EXT4_GET_BLOCKS_CREATE |
 			   EXT4_GET_BLOCKS_METADATA_NOFAIL |
@@ -2236,8 +2239,6 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
 	dioread_nolock = ext4_should_dioread_nolock(inode);
 	if (dioread_nolock)
 		get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
-	if (map->m_flags & BIT(BH_Delay))
-		get_blocks_flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;
 
 	err = ext4_map_blocks(handle, inode, map, get_blocks_flags);
 	if (err < 0)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
  2024-08-02 11:51 ` [PATCH v2 01/10] ext4: factor out ext4_map_create_blocks() to allocate new blocks Zhang Yi
  2024-08-02 11:51 ` [PATCH v2 02/10] ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-06 15:23   ` Jan Kara
  2024-08-02 11:51 ` [PATCH v2 04/10] ext4: let __revise_pending() return newly inserted pendings Zhang Yi
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Since we always set EXT4_GET_BLOCKS_DELALLOC_RESERVE when allocating
delalloc blocks, there is no need to keep delayed flag on the unwritten
extent status entry, so just drop it after allocation.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents_status.c |  9 +--------
 fs/ext4/inode.c          | 11 -----------
 2 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 17dcf13adde2..024cd37d53b3 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -869,14 +869,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 		return;
 
 	BUG_ON(end < lblk);
-
-	if ((status & EXTENT_STATUS_DELAYED) &&
-	    (status & EXTENT_STATUS_WRITTEN)) {
-		ext4_warning(inode->i_sb, "Inserting extent [%u/%u] as "
-				" delayed and written which can potentially "
-				" cause data loss.", lblk, len);
-		WARN_ON(1);
-	}
+	WARN_ON_ONCE(status & EXTENT_STATUS_DELAYED);
 
 	newes.es_lblk = lblk;
 	newes.es_len = len;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 91b2610a6dc5..e9ce1e4e6acb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -558,12 +558,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
 
 	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
 			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
-	if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
-	    !(status & EXTENT_STATUS_WRITTEN) &&
-	    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
-			       map->m_lblk + map->m_len - 1))
-		status |= EXTENT_STATUS_DELAYED;
-
 	ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
 			      map->m_pblk, status);
 
@@ -682,11 +676,6 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 
 		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
 				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
-		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
-		    !(status & EXTENT_STATUS_WRITTEN) &&
-		    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
-				       map->m_lblk + map->m_len - 1))
-			status |= EXTENT_STATUS_DELAYED;
 		ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
 				      map->m_pblk, status);
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/10] ext4: let __revise_pending() return newly inserted pendings
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
                   ` (2 preceding siblings ...)
  2024-08-02 11:51 ` [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-02 11:51 ` [PATCH v2 05/10] ext4: count removed reserved blocks for delalloc only extent entry Zhang Yi
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Let __insert_pending() return 1 after successfully inserting a new
pending cluster, and also let __revise_pending() to return the number of
of newly inserted pendings.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents_status.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 024cd37d53b3..4d24b56cfaf0 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -887,7 +887,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 		es1 = __es_alloc_extent(true);
 	if ((err1 || err2) && !es2)
 		es2 = __es_alloc_extent(true);
-	if ((err1 || err2 || err3) && revise_pending && !pr)
+	if ((err1 || err2 || err3 < 0) && revise_pending && !pr)
 		pr = __alloc_pending(true);
 	write_lock(&EXT4_I(inode)->i_es_lock);
 
@@ -915,7 +915,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 
 	if (revise_pending) {
 		err3 = __revise_pending(inode, lblk, len, &pr);
-		if (err3 != 0)
+		if (err3 < 0)
 			goto error;
 		if (pr) {
 			__free_pending(pr);
@@ -924,7 +924,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 	}
 error:
 	write_unlock(&EXT4_I(inode)->i_es_lock);
-	if (err1 || err2 || err3)
+	if (err1 || err2 || err3 < 0)
 		goto retry;
 
 	ext4_es_print_tree(inode);
@@ -1933,7 +1933,7 @@ static struct pending_reservation *__get_pending(struct inode *inode,
  * @lblk - logical block in the cluster to be added
  * @prealloc - preallocated pending entry
  *
- * Returns 0 on successful insertion and -ENOMEM on failure.  If the
+ * Returns 1 on successful insertion and -ENOMEM on failure.  If the
  * pending reservation is already in the set, returns successfully.
  */
 static int __insert_pending(struct inode *inode, ext4_lblk_t lblk,
@@ -1977,6 +1977,7 @@ static int __insert_pending(struct inode *inode, ext4_lblk_t lblk,
 
 	rb_link_node(&pr->rb_node, parent, p);
 	rb_insert_color(&pr->rb_node, &tree->root);
+	ret = 1;
 
 out:
 	return ret;
@@ -2098,7 +2099,7 @@ void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
 		es1 = __es_alloc_extent(true);
 	if ((err1 || err2) && !es2)
 		es2 = __es_alloc_extent(true);
-	if (err1 || err2 || err3) {
+	if (err1 || err2 || err3 < 0) {
 		if (lclu_allocated && !pr1)
 			pr1 = __alloc_pending(true);
 		if (end_allocated && !pr2)
@@ -2128,7 +2129,7 @@ void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
 
 	if (lclu_allocated) {
 		err3 = __insert_pending(inode, lblk, &pr1);
-		if (err3 != 0)
+		if (err3 < 0)
 			goto error;
 		if (pr1) {
 			__free_pending(pr1);
@@ -2137,7 +2138,7 @@ void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
 	}
 	if (end_allocated) {
 		err3 = __insert_pending(inode, end, &pr2);
-		if (err3 != 0)
+		if (err3 < 0)
 			goto error;
 		if (pr2) {
 			__free_pending(pr2);
@@ -2146,7 +2147,7 @@ void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
 	}
 error:
 	write_unlock(&EXT4_I(inode)->i_es_lock);
-	if (err1 || err2 || err3)
+	if (err1 || err2 || err3 < 0)
 		goto retry;
 
 	ext4_es_print_tree(inode);
@@ -2256,7 +2257,9 @@ unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
  *
  * Used after a newly allocated extent is added to the extents status tree.
  * Requires that the extents in the range have either written or unwritten
- * status.  Must be called while holding i_es_lock.
+ * status.  Must be called while holding i_es_lock. Returns number of new
+ * inserts pending cluster on insert pendings, returns 0 on remove pendings,
+ * return -ENOMEM on failure.
  */
 static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 			    ext4_lblk_t len,
@@ -2266,6 +2269,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 	ext4_lblk_t end = lblk + len - 1;
 	ext4_lblk_t first, last;
 	bool f_del = false, l_del = false;
+	int pendings = 0;
 	int ret = 0;
 
 	if (len == 0)
@@ -2293,6 +2297,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 			ret = __insert_pending(inode, first, prealloc);
 			if (ret < 0)
 				goto out;
+			pendings += ret;
 		} else {
 			last = EXT4_LBLK_CMASK(sbi, end) +
 			       sbi->s_cluster_ratio - 1;
@@ -2304,6 +2309,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 				ret = __insert_pending(inode, last, prealloc);
 				if (ret < 0)
 					goto out;
+				pendings += ret;
 			} else
 				__remove_pending(inode, last);
 		}
@@ -2316,6 +2322,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 			ret = __insert_pending(inode, first, prealloc);
 			if (ret < 0)
 				goto out;
+			pendings += ret;
 		} else
 			__remove_pending(inode, first);
 
@@ -2327,9 +2334,10 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 			ret = __insert_pending(inode, last, prealloc);
 			if (ret < 0)
 				goto out;
+			pendings += ret;
 		} else
 			__remove_pending(inode, last);
 	}
 out:
-	return ret;
+	return (ret < 0) ? ret : pendings;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/10] ext4: count removed reserved blocks for delalloc only extent entry
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
                   ` (3 preceding siblings ...)
  2024-08-02 11:51 ` [PATCH v2 04/10] ext4: let __revise_pending() return newly inserted pendings Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-02 11:51 ` [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent() Zhang Yi
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

If bigalloc feature is enabled, __es_remove_extent() only counts
reserved clusters when removing delalloc extent entry, it doesn't count
reserved blocks. However, it's useful to distinguish whether we are
allocating blocks that contains a delalloc range in one cluster, so
let's count the reserved blocks number too.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents_status.c | 64 +++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 4d24b56cfaf0..3107e07ffe46 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -141,13 +141,18 @@
  *   -- Extent-level locking
  */
 
+struct rsvd_info {
+	int delonly_cluster;	/* reserved clusters for delalloc es entry */
+	int delonly_block;	/* reserved blocks for delalloc es entry */
+};
+
 static struct kmem_cache *ext4_es_cachep;
 static struct kmem_cache *ext4_pending_cachep;
 
 static int __es_insert_extent(struct inode *inode, struct extent_status *newes,
 			      struct extent_status *prealloc);
 static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
-			      ext4_lblk_t end, int *reserved,
+			      ext4_lblk_t end, struct rsvd_info *rinfo,
 			      struct extent_status *prealloc);
 static int es_reclaim_extents(struct ext4_inode_info *ei, int *nr_to_scan);
 static int __es_shrink(struct ext4_sb_info *sbi, int nr_to_scan,
@@ -1044,7 +1049,8 @@ int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
 }
 
 struct rsvd_count {
-	int ndelonly;
+	int ndelonly_cluster;
+	int ndelonly_block;
 	bool first_do_lblk_found;
 	ext4_lblk_t first_do_lblk;
 	ext4_lblk_t last_do_lblk;
@@ -1070,7 +1076,8 @@ static void init_rsvd(struct inode *inode, ext4_lblk_t lblk,
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct rb_node *node;
 
-	rc->ndelonly = 0;
+	rc->ndelonly_cluster = 0;
+	rc->ndelonly_block = 0;
 
 	/*
 	 * for bigalloc, note the first delonly block in the range has not
@@ -1118,11 +1125,13 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 	WARN_ON(len <= 0);
 
 	if (sbi->s_cluster_ratio == 1) {
-		rc->ndelonly += (int) len;
+		rc->ndelonly_cluster += (int) len;
+		rc->ndelonly_block = rc->ndelonly_cluster;
 		return;
 	}
 
 	/* bigalloc */
+	rc->ndelonly_block += (int)len;
 
 	i = (lblk < es->es_lblk) ? es->es_lblk : lblk;
 	end = lblk + (ext4_lblk_t) len - 1;
@@ -1142,7 +1151,7 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 	 * doesn't start with it, count it and stop tracking
 	 */
 	if (rc->partial && (rc->lclu != EXT4_B2C(sbi, i))) {
-		rc->ndelonly++;
+		rc->ndelonly_cluster++;
 		rc->partial = false;
 	}
 
@@ -1152,7 +1161,7 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 	 */
 	if (EXT4_LBLK_COFF(sbi, i) != 0) {
 		if (end >= EXT4_LBLK_CFILL(sbi, i)) {
-			rc->ndelonly++;
+			rc->ndelonly_cluster++;
 			rc->partial = false;
 			i = EXT4_LBLK_CFILL(sbi, i) + 1;
 		}
@@ -1164,7 +1173,7 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 	 */
 	if ((i + sbi->s_cluster_ratio - 1) <= end) {
 		nclu = (end - i + 1) >> sbi->s_cluster_bits;
-		rc->ndelonly += nclu;
+		rc->ndelonly_cluster += nclu;
 		i += nclu << sbi->s_cluster_bits;
 	}
 
@@ -1244,9 +1253,9 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 	if (sbi->s_cluster_ratio > 1) {
 		/* count any remaining partial cluster */
 		if (rc->partial)
-			rc->ndelonly++;
+			rc->ndelonly_cluster++;
 
-		if (rc->ndelonly == 0)
+		if (rc->ndelonly_cluster == 0)
 			return 0;
 
 		first_lclu = EXT4_B2C(sbi, rc->first_do_lblk);
@@ -1263,7 +1272,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 		while (es && ext4_es_end(es) >=
 		       EXT4_LBLK_CMASK(sbi, rc->first_do_lblk)) {
 			if (ext4_es_is_delonly(es)) {
-				rc->ndelonly--;
+				rc->ndelonly_cluster--;
 				left_delonly = true;
 				break;
 			}
@@ -1283,7 +1292,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 			while (es && es->es_lblk <=
 			       EXT4_LBLK_CFILL(sbi, rc->last_do_lblk)) {
 				if (ext4_es_is_delonly(es)) {
-					rc->ndelonly--;
+					rc->ndelonly_cluster--;
 					right_delonly = true;
 					break;
 				}
@@ -1329,7 +1338,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 		if (count_pending) {
 			pr = __pr_tree_search(&tree->root, first_lclu);
 			while (pr && pr->lclu <= last_lclu) {
-				rc->ndelonly--;
+				rc->ndelonly_cluster--;
 				node = rb_next(&pr->rb_node);
 				rb_erase(&pr->rb_node, &tree->root);
 				__free_pending(pr);
@@ -1340,7 +1349,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 			}
 		}
 	}
-	return rc->ndelonly;
+	return rc->ndelonly_cluster;
 }
 
 
@@ -1350,16 +1359,17 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
  * @inode - file containing range
  * @lblk - first block in range
  * @end - last block in range
- * @reserved - number of cluster reservations released
+ * @rinfo - reserved information collected, includes number of
+ *          block/cluster reservations released
  * @prealloc - pre-allocated es to avoid memory allocation failures
  *
- * If @reserved is not NULL and delayed allocation is enabled, counts
+ * If @rinfo is not NULL and delayed allocation is enabled, counts
  * block/cluster reservations freed by removing range and if bigalloc
  * enabled cancels pending reservations as needed. Returns 0 on success,
  * error code on failure.
  */
 static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
-			      ext4_lblk_t end, int *reserved,
+			      ext4_lblk_t end, struct rsvd_info *rinfo,
 			      struct extent_status *prealloc)
 {
 	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
@@ -1369,11 +1379,15 @@ static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 	ext4_lblk_t len1, len2;
 	ext4_fsblk_t block;
 	int err = 0;
-	bool count_reserved = true;
+	bool count_reserved = false;
 	struct rsvd_count rc;
 
-	if (reserved == NULL || !test_opt(inode->i_sb, DELALLOC))
-		count_reserved = false;
+	if (rinfo) {
+		rinfo->delonly_cluster = 0;
+		rinfo->delonly_block = 0;
+		if (test_opt(inode->i_sb, DELALLOC))
+			count_reserved = true;
+	}
 
 	es = __es_tree_search(&tree->root, lblk);
 	if (!es)
@@ -1471,8 +1485,10 @@ static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 	}
 
 out_get_reserved:
-	if (count_reserved)
-		*reserved = get_rsvd(inode, end, es, &rc);
+	if (count_reserved) {
+		rinfo->delonly_cluster = get_rsvd(inode, end, es, &rc);
+		rinfo->delonly_block = rc.ndelonly_block;
+	}
 out:
 	return err;
 }
@@ -1491,8 +1507,8 @@ void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 			   ext4_lblk_t len)
 {
 	ext4_lblk_t end;
+	struct rsvd_info rinfo;
 	int err = 0;
-	int reserved = 0;
 	struct extent_status *es = NULL;
 
 	if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
@@ -1517,7 +1533,7 @@ void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 	 * is reclaimed.
 	 */
 	write_lock(&EXT4_I(inode)->i_es_lock);
-	err = __es_remove_extent(inode, lblk, end, &reserved, es);
+	err = __es_remove_extent(inode, lblk, end, &rinfo, es);
 	/* Free preallocated extent if it didn't get used. */
 	if (es) {
 		if (!es->es_len)
@@ -1529,7 +1545,7 @@ void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 		goto retry;
 
 	ext4_es_print_tree(inode);
-	ext4_da_release_space(inode, reserved);
+	ext4_da_release_space(inode, rinfo.delonly_cluster);
 	return;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
                   ` (4 preceding siblings ...)
  2024-08-02 11:51 ` [PATCH v2 05/10] ext4: count removed reserved blocks for delalloc only extent entry Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-07 17:41   ` Jan Kara
  2024-08-02 11:51 ` [PATCH v2 07/10] ext4: drop ext4_es_delayed_clu() Zhang Yi
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Now that we update data reserved space for delalloc after allocating
new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
enabled, we also need to query the extents_status tree to calculate the
exact reserved clusters. This is complicated now and it appears that
it's better to do this job in ext4_es_insert_extent(), because
__es_remove_extent() have already count delalloc blocks when removing
delalloc extents and __revise_pending() return new adding pending count,
we could update the reserved blocks easily in ext4_es_insert_extent().

Thers is one special case needs to concern is the quota claiming, when
bigalloc is enabled, if the delayed cluster allocation has been raced
by another no-delayed allocation(e.g. from fallocate) which doesn't
cover the delayed blocks:

  |<       one cluster       >|
  hhhhhhhhhhhhhhhhhhhdddddddddd
  ^            ^
  |<          >| < fallocate this range, don't claim quota again

We can't claim quota as usual because the fallocate has already claimed
it in ext4_mb_new_blocks(), we could notice this case through the
removed delalloc blocks count.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c        | 37 -------------------------------------
 fs/ext4/extents_status.c | 22 +++++++++++++++++++++-
 fs/ext4/indirect.c       |  7 -------
 3 files changed, 21 insertions(+), 45 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e067f2dd0335..a58240fdfe3f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4356,43 +4356,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 		goto out;
 	}
 
-	/*
-	 * Reduce the reserved cluster count to reflect successful deferred
-	 * allocation of delayed allocated clusters or direct allocation of
-	 * clusters discovered to be delayed allocated.  Once allocated, a
-	 * cluster is not included in the reserved count.
-	 */
-	if (test_opt(inode->i_sb, DELALLOC) && allocated_clusters) {
-		if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
-			/*
-			 * When allocating delayed allocated clusters, simply
-			 * reduce the reserved cluster count and claim quota
-			 */
-			ext4_da_update_reserve_space(inode, allocated_clusters,
-							1);
-		} else {
-			ext4_lblk_t lblk, len;
-			unsigned int n;
-
-			/*
-			 * When allocating non-delayed allocated clusters
-			 * (from fallocate, filemap, DIO, or clusters
-			 * allocated when delalloc has been disabled by
-			 * ext4_nonda_switch), reduce the reserved cluster
-			 * count by the number of allocated clusters that
-			 * have previously been delayed allocated.  Quota
-			 * has been claimed by ext4_mb_new_blocks() above,
-			 * so release the quota reservations made for any
-			 * previously delayed allocated clusters.
-			 */
-			lblk = EXT4_LBLK_CMASK(sbi, map->m_lblk);
-			len = allocated_clusters << sbi->s_cluster_bits;
-			n = ext4_es_delayed_clu(inode, lblk, len);
-			if (n > 0)
-				ext4_da_update_reserve_space(inode, (int) n, 0);
-		}
-	}
-
 	/*
 	 * Cache the extent and update transaction to commit on fdatasync only
 	 * when it is _not_ an unwritten extent.
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 3107e07ffe46..2daf61cfcf58 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -858,6 +858,8 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 	struct extent_status newes;
 	ext4_lblk_t end = lblk + len - 1;
 	int err1 = 0, err2 = 0, err3 = 0;
+	struct rsvd_info rinfo;
+	int resv_used, pending = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct extent_status *es1 = NULL;
 	struct extent_status *es2 = NULL;
@@ -896,7 +898,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 		pr = __alloc_pending(true);
 	write_lock(&EXT4_I(inode)->i_es_lock);
 
-	err1 = __es_remove_extent(inode, lblk, end, NULL, es1);
+	err1 = __es_remove_extent(inode, lblk, end, &rinfo, es1);
 	if (err1 != 0)
 		goto error;
 	/* Free preallocated extent if it didn't get used. */
@@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 			__free_pending(pr);
 			pr = NULL;
 		}
+		pending = err3;
 	}
 error:
 	write_unlock(&EXT4_I(inode)->i_es_lock);
+	/*
+	 * Reduce the reserved cluster count to reflect successful deferred
+	 * allocation of delayed allocated clusters or direct allocation of
+	 * clusters discovered to be delayed allocated.  Once allocated, a
+	 * cluster is not included in the reserved count.
+	 *
+	 * When bigalloc is enabled, allocating non-delayed allocated blocks
+	 * which belong to delayed allocated clusters (from fallocate, filemap,
+	 * DIO, or clusters allocated when delalloc has been disabled by
+	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
+	 * so release the quota reservations made for any previously delayed
+	 * allocated clusters.
+	 */
+	resv_used = rinfo.delonly_cluster + pending;
+	if (resv_used)
+		ext4_da_update_reserve_space(inode, resv_used,
+					     rinfo.delonly_block);
 	if (err1 || err2 || err3 < 0)
 		goto retry;
 
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index d8ca7f64f952..7404f0935c90 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -652,13 +652,6 @@ int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
 	ext4_update_inode_fsync_trans(handle, inode, 1);
 	count = ar.len;
 
-	/*
-	 * Update reserved blocks/metadata blocks after successful block
-	 * allocation which had been deferred till now.
-	 */
-	if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
-		ext4_da_update_reserve_space(inode, count, 1);
-
 got_it:
 	map->m_flags |= EXT4_MAP_MAPPED;
 	map->m_pblk = le32_to_cpu(chain[depth-1].key);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/10] ext4: drop ext4_es_delayed_clu()
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
                   ` (5 preceding siblings ...)
  2024-08-02 11:51 ` [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent() Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-02 11:51 ` [PATCH v2 08/10] ext4: use ext4_map_query_blocks() in ext4_map_blocks() Zhang Yi
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Since we move ext4_da_update_reserve_space() to ext4_es_insert_extent(),
no one uses ext4_es_delayed_clu() and __es_delayed_clu(), just drop
them.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents_status.c | 88 ----------------------------------------
 fs/ext4/extents_status.h |  2 -
 2 files changed, 90 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 2daf61cfcf58..e482ac818317 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -2191,94 +2191,6 @@ void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
 	return;
 }
 
-/*
- * __es_delayed_clu - count number of clusters containing blocks that
- *                    are delayed only
- *
- * @inode - file containing block range
- * @start - logical block defining start of range
- * @end - logical block defining end of range
- *
- * Returns the number of clusters containing only delayed (not delayed
- * and unwritten) blocks in the range specified by @start and @end.  Any
- * cluster or part of a cluster within the range and containing a delayed
- * and not unwritten block within the range is counted as a whole cluster.
- */
-static unsigned int __es_delayed_clu(struct inode *inode, ext4_lblk_t start,
-				     ext4_lblk_t end)
-{
-	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
-	struct extent_status *es;
-	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
-	struct rb_node *node;
-	ext4_lblk_t first_lclu, last_lclu;
-	unsigned long long last_counted_lclu;
-	unsigned int n = 0;
-
-	/* guaranteed to be unequal to any ext4_lblk_t value */
-	last_counted_lclu = ~0ULL;
-
-	es = __es_tree_search(&tree->root, start);
-
-	while (es && (es->es_lblk <= end)) {
-		if (ext4_es_is_delonly(es)) {
-			if (es->es_lblk <= start)
-				first_lclu = EXT4_B2C(sbi, start);
-			else
-				first_lclu = EXT4_B2C(sbi, es->es_lblk);
-
-			if (ext4_es_end(es) >= end)
-				last_lclu = EXT4_B2C(sbi, end);
-			else
-				last_lclu = EXT4_B2C(sbi, ext4_es_end(es));
-
-			if (first_lclu == last_counted_lclu)
-				n += last_lclu - first_lclu;
-			else
-				n += last_lclu - first_lclu + 1;
-			last_counted_lclu = last_lclu;
-		}
-		node = rb_next(&es->rb_node);
-		if (!node)
-			break;
-		es = rb_entry(node, struct extent_status, rb_node);
-	}
-
-	return n;
-}
-
-/*
- * ext4_es_delayed_clu - count number of clusters containing blocks that
- *                       are both delayed and unwritten
- *
- * @inode - file containing block range
- * @lblk - logical block defining start of range
- * @len - number of blocks in range
- *
- * Locking for external use of __es_delayed_clu().
- */
-unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
-				 ext4_lblk_t len)
-{
-	struct ext4_inode_info *ei = EXT4_I(inode);
-	ext4_lblk_t end;
-	unsigned int n;
-
-	if (len == 0)
-		return 0;
-
-	end = lblk + len - 1;
-	WARN_ON(end < lblk);
-
-	read_lock(&ei->i_es_lock);
-
-	n = __es_delayed_clu(inode, lblk, end);
-
-	read_unlock(&ei->i_es_lock);
-
-	return n;
-}
-
 /*
  * __revise_pending - makes, cancels, or leaves unchanged pending cluster
  *                    reservations for a specified block range depending
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 3c8e2edee5d5..5b49cb3b9aff 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -252,8 +252,6 @@ extern bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk);
 extern void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
 					  ext4_lblk_t len, bool lclu_allocated,
 					  bool end_allocated);
-extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
-					ext4_lblk_t len);
 extern void ext4_clear_inode_es(struct inode *inode);
 
 #endif /* _EXT4_EXTENTS_STATUS_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/10] ext4: use ext4_map_query_blocks() in ext4_map_blocks()
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
                   ` (6 preceding siblings ...)
  2024-08-02 11:51 ` [PATCH v2 07/10] ext4: drop ext4_es_delayed_clu() Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-07 17:43   ` Jan Kara
  2024-08-02 11:51 ` [PATCH v2 09/10] ext4: drop ext4_es_is_delonly() Zhang Yi
  2024-08-02 11:51 ` [PATCH v2 10/10] ext4: drop all delonly descriptions Zhang Yi
  9 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

The blocks map querying logic in ext4_map_blocks() are the same as
ext4_map_query_blocks(), so switch to directly use it.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 22 +---------------------
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e9ce1e4e6acb..8bd65a45a26a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -658,27 +658,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	 * file system block.
 	 */
 	down_read(&EXT4_I(inode)->i_data_sem);
-	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
-		retval = ext4_ext_map_blocks(handle, inode, map, 0);
-	} else {
-		retval = ext4_ind_map_blocks(handle, inode, map, 0);
-	}
-	if (retval > 0) {
-		unsigned int status;
-
-		if (unlikely(retval != map->m_len)) {
-			ext4_warning(inode->i_sb,
-				     "ES len assertion failed for inode "
-				     "%lu: retval %d != map->m_len %d",
-				     inode->i_ino, retval, map->m_len);
-			WARN_ON(1);
-		}
-
-		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
-				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
-		ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
-				      map->m_pblk, status);
-	}
+	retval = ext4_map_query_blocks(handle, inode, map);
 	up_read((&EXT4_I(inode)->i_data_sem));
 
 found:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/10] ext4: drop ext4_es_is_delonly()
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
                   ` (7 preceding siblings ...)
  2024-08-02 11:51 ` [PATCH v2 08/10] ext4: use ext4_map_query_blocks() in ext4_map_blocks() Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  2024-08-07 17:48   ` Jan Kara
  2024-08-02 11:51 ` [PATCH v2 10/10] ext4: drop all delonly descriptions Zhang Yi
  9 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Since we don't add delayed flag in unwritten extents, so there is no
difference between ext4_es_is_delayed() and ext4_es_is_delonly(),
just drop ext4_es_is_delonly().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents_status.c | 18 +++++++++---------
 fs/ext4/extents_status.h |  5 -----
 fs/ext4/inode.c          |  4 ++--
 3 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index e482ac818317..5fb0a02405ba 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -563,8 +563,8 @@ static int ext4_es_can_be_merged(struct extent_status *es1,
 	if (ext4_es_is_hole(es1))
 		return 1;
 
-	/* we need to check delayed extent is without unwritten status */
-	if (ext4_es_is_delayed(es1) && !ext4_es_is_unwritten(es1))
+	/* we need to check delayed extent */
+	if (ext4_es_is_delayed(es1))
 		return 1;
 
 	return 0;
@@ -1139,7 +1139,7 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	ext4_lblk_t i, end, nclu;
 
-	if (!ext4_es_is_delonly(es))
+	if (!ext4_es_is_delayed(es))
 		return;
 
 	WARN_ON(len <= 0);
@@ -1291,7 +1291,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 		es = rc->left_es;
 		while (es && ext4_es_end(es) >=
 		       EXT4_LBLK_CMASK(sbi, rc->first_do_lblk)) {
-			if (ext4_es_is_delonly(es)) {
+			if (ext4_es_is_delayed(es)) {
 				rc->ndelonly_cluster--;
 				left_delonly = true;
 				break;
@@ -1311,7 +1311,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 			}
 			while (es && es->es_lblk <=
 			       EXT4_LBLK_CFILL(sbi, rc->last_do_lblk)) {
-				if (ext4_es_is_delonly(es)) {
+				if (ext4_es_is_delayed(es)) {
 					rc->ndelonly_cluster--;
 					right_delonly = true;
 					break;
@@ -2239,7 +2239,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 	if (EXT4_B2C(sbi, lblk) == EXT4_B2C(sbi, end)) {
 		first = EXT4_LBLK_CMASK(sbi, lblk);
 		if (first != lblk)
-			f_del = __es_scan_range(inode, &ext4_es_is_delonly,
+			f_del = __es_scan_range(inode, &ext4_es_is_delayed,
 						first, lblk - 1);
 		if (f_del) {
 			ret = __insert_pending(inode, first, prealloc);
@@ -2251,7 +2251,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 			       sbi->s_cluster_ratio - 1;
 			if (last != end)
 				l_del = __es_scan_range(inode,
-							&ext4_es_is_delonly,
+							&ext4_es_is_delayed,
 							end + 1, last);
 			if (l_del) {
 				ret = __insert_pending(inode, last, prealloc);
@@ -2264,7 +2264,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 	} else {
 		first = EXT4_LBLK_CMASK(sbi, lblk);
 		if (first != lblk)
-			f_del = __es_scan_range(inode, &ext4_es_is_delonly,
+			f_del = __es_scan_range(inode, &ext4_es_is_delayed,
 						first, lblk - 1);
 		if (f_del) {
 			ret = __insert_pending(inode, first, prealloc);
@@ -2276,7 +2276,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
 
 		last = EXT4_LBLK_CMASK(sbi, end) + sbi->s_cluster_ratio - 1;
 		if (last != end)
-			l_del = __es_scan_range(inode, &ext4_es_is_delonly,
+			l_del = __es_scan_range(inode, &ext4_es_is_delayed,
 						end + 1, last);
 		if (l_del) {
 			ret = __insert_pending(inode, last, prealloc);
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 5b49cb3b9aff..e484c60e55e3 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -184,11 +184,6 @@ static inline int ext4_es_is_mapped(struct extent_status *es)
 	return (ext4_es_is_written(es) || ext4_es_is_unwritten(es));
 }
 
-static inline int ext4_es_is_delonly(struct extent_status *es)
-{
-	return (ext4_es_is_delayed(es) && !ext4_es_is_unwritten(es));
-}
-
 static inline void ext4_es_set_referenced(struct extent_status *es)
 {
 	es->es_pblk |= ((ext4_fsblk_t)EXTENT_STATUS_REFERENCED) << ES_SHIFT;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8bd65a45a26a..2b301c165468 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1645,7 +1645,7 @@ static int ext4_clu_alloc_state(struct inode *inode, ext4_lblk_t lblk)
 	int ret;
 
 	/* Has delalloc reservation? */
-	if (ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
+	if (ext4_es_scan_clu(inode, &ext4_es_is_delayed, lblk))
 		return 1;
 
 	/* Already been allocated? */
@@ -1766,7 +1766,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map)
 		 * Delayed extent could be allocated by fallocate.
 		 * So we need to check it.
 		 */
-		if (ext4_es_is_delonly(&es)) {
+		if (ext4_es_is_delayed(&es)) {
 			map->m_flags |= EXT4_MAP_DELAYED;
 			return 0;
 		}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/10] ext4: drop all delonly descriptions
  2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
                   ` (8 preceding siblings ...)
  2024-08-02 11:51 ` [PATCH v2 09/10] ext4: drop ext4_es_is_delonly() Zhang Yi
@ 2024-08-02 11:51 ` Zhang Yi
  9 siblings, 0 replies; 25+ messages in thread
From: Zhang Yi @ 2024-08-02 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Drop all delonly descriptions in parameters and comments.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents_status.c | 92 ++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 47 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 5fb0a02405ba..54ef599869f7 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -142,8 +142,8 @@
  */
 
 struct rsvd_info {
-	int delonly_cluster;	/* reserved clusters for delalloc es entry */
-	int delonly_block;	/* reserved blocks for delalloc es entry */
+	int delayed_cluster;	/* reserved clusters for delalloc es entry */
+	int delayed_block;	/* reserved blocks for delalloc es entry */
 };
 
 static struct kmem_cache *ext4_es_cachep;
@@ -945,10 +945,10 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 	 * so release the quota reservations made for any previously delayed
 	 * allocated clusters.
 	 */
-	resv_used = rinfo.delonly_cluster + pending;
+	resv_used = rinfo.delayed_cluster + pending;
 	if (resv_used)
 		ext4_da_update_reserve_space(inode, resv_used,
-					     rinfo.delonly_block);
+					     rinfo.delayed_block);
 	if (err1 || err2 || err3 < 0)
 		goto retry;
 
@@ -1069,8 +1069,8 @@ int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
 }
 
 struct rsvd_count {
-	int ndelonly_cluster;
-	int ndelonly_block;
+	int ndelayed_cluster;
+	int ndelayed_block;
 	bool first_do_lblk_found;
 	ext4_lblk_t first_do_lblk;
 	ext4_lblk_t last_do_lblk;
@@ -1096,11 +1096,11 @@ static void init_rsvd(struct inode *inode, ext4_lblk_t lblk,
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct rb_node *node;
 
-	rc->ndelonly_cluster = 0;
-	rc->ndelonly_block = 0;
+	rc->ndelayed_cluster = 0;
+	rc->ndelayed_block = 0;
 
 	/*
-	 * for bigalloc, note the first delonly block in the range has not
+	 * for bigalloc, note the first delayed block in the range has not
 	 * been found, record the extent containing the block to the left of
 	 * the region to be removed, if any, and note that there's no partial
 	 * cluster to track
@@ -1120,9 +1120,8 @@ static void init_rsvd(struct inode *inode, ext4_lblk_t lblk,
 }
 
 /*
- * count_rsvd - count the clusters containing delayed and not unwritten
- *		(delonly) blocks in a range within an extent and add to
- *	        the running tally in rsvd_count
+ * count_rsvd - count the clusters containing delayed blocks in a range
+ *	        within an extent and add to the running tally in rsvd_count
  *
  * @inode - file containing extent
  * @lblk - first block in range
@@ -1145,19 +1144,19 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 	WARN_ON(len <= 0);
 
 	if (sbi->s_cluster_ratio == 1) {
-		rc->ndelonly_cluster += (int) len;
-		rc->ndelonly_block = rc->ndelonly_cluster;
+		rc->ndelayed_cluster += (int) len;
+		rc->ndelayed_block = rc->ndelayed_cluster;
 		return;
 	}
 
 	/* bigalloc */
-	rc->ndelonly_block += (int)len;
+	rc->ndelayed_block += (int)len;
 
 	i = (lblk < es->es_lblk) ? es->es_lblk : lblk;
 	end = lblk + (ext4_lblk_t) len - 1;
 	end = (end > ext4_es_end(es)) ? ext4_es_end(es) : end;
 
-	/* record the first block of the first delonly extent seen */
+	/* record the first block of the first delayed extent seen */
 	if (!rc->first_do_lblk_found) {
 		rc->first_do_lblk = i;
 		rc->first_do_lblk_found = true;
@@ -1171,7 +1170,7 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 	 * doesn't start with it, count it and stop tracking
 	 */
 	if (rc->partial && (rc->lclu != EXT4_B2C(sbi, i))) {
-		rc->ndelonly_cluster++;
+		rc->ndelayed_cluster++;
 		rc->partial = false;
 	}
 
@@ -1181,7 +1180,7 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 	 */
 	if (EXT4_LBLK_COFF(sbi, i) != 0) {
 		if (end >= EXT4_LBLK_CFILL(sbi, i)) {
-			rc->ndelonly_cluster++;
+			rc->ndelayed_cluster++;
 			rc->partial = false;
 			i = EXT4_LBLK_CFILL(sbi, i) + 1;
 		}
@@ -1189,11 +1188,11 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
 
 	/*
 	 * if the current cluster starts on a cluster boundary, count the
-	 * number of whole delonly clusters in the extent
+	 * number of whole delayed clusters in the extent
 	 */
 	if ((i + sbi->s_cluster_ratio - 1) <= end) {
 		nclu = (end - i + 1) >> sbi->s_cluster_bits;
-		rc->ndelonly_cluster += nclu;
+		rc->ndelayed_cluster += nclu;
 		i += nclu << sbi->s_cluster_bits;
 	}
 
@@ -1253,10 +1252,9 @@ static struct pending_reservation *__pr_tree_search(struct rb_root *root,
  * @rc - pointer to reserved count data
  *
  * The number of reservations to be released is equal to the number of
- * clusters containing delayed and not unwritten (delonly) blocks within
- * the range, minus the number of clusters still containing delonly blocks
- * at the ends of the range, and minus the number of pending reservations
- * within the range.
+ * clusters containing delayed blocks within the range, minus the number of
+ * clusters still containing delayed blocks at the ends of the range, and
+ * minus the number of pending reservations within the range.
  */
 static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 			     struct extent_status *right_es,
@@ -1267,33 +1265,33 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 	struct ext4_pending_tree *tree = &EXT4_I(inode)->i_pending_tree;
 	struct rb_node *node;
 	ext4_lblk_t first_lclu, last_lclu;
-	bool left_delonly, right_delonly, count_pending;
+	bool left_delayed, right_delayed, count_pending;
 	struct extent_status *es;
 
 	if (sbi->s_cluster_ratio > 1) {
 		/* count any remaining partial cluster */
 		if (rc->partial)
-			rc->ndelonly_cluster++;
+			rc->ndelayed_cluster++;
 
-		if (rc->ndelonly_cluster == 0)
+		if (rc->ndelayed_cluster == 0)
 			return 0;
 
 		first_lclu = EXT4_B2C(sbi, rc->first_do_lblk);
 		last_lclu = EXT4_B2C(sbi, rc->last_do_lblk);
 
 		/*
-		 * decrease the delonly count by the number of clusters at the
-		 * ends of the range that still contain delonly blocks -
+		 * decrease the delayed count by the number of clusters at the
+		 * ends of the range that still contain delayed blocks -
 		 * these clusters still need to be reserved
 		 */
-		left_delonly = right_delonly = false;
+		left_delayed = right_delayed = false;
 
 		es = rc->left_es;
 		while (es && ext4_es_end(es) >=
 		       EXT4_LBLK_CMASK(sbi, rc->first_do_lblk)) {
 			if (ext4_es_is_delayed(es)) {
-				rc->ndelonly_cluster--;
-				left_delonly = true;
+				rc->ndelayed_cluster--;
+				left_delayed = true;
 				break;
 			}
 			node = rb_prev(&es->rb_node);
@@ -1301,7 +1299,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 				break;
 			es = rb_entry(node, struct extent_status, rb_node);
 		}
-		if (right_es && (!left_delonly || first_lclu != last_lclu)) {
+		if (right_es && (!left_delayed || first_lclu != last_lclu)) {
 			if (end < ext4_es_end(right_es)) {
 				es = right_es;
 			} else {
@@ -1312,8 +1310,8 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 			while (es && es->es_lblk <=
 			       EXT4_LBLK_CFILL(sbi, rc->last_do_lblk)) {
 				if (ext4_es_is_delayed(es)) {
-					rc->ndelonly_cluster--;
-					right_delonly = true;
+					rc->ndelayed_cluster--;
+					right_delayed = true;
 					break;
 				}
 				node = rb_next(&es->rb_node);
@@ -1327,21 +1325,21 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 		/*
 		 * Determine the block range that should be searched for
 		 * pending reservations, if any.  Clusters on the ends of the
-		 * original removed range containing delonly blocks are
+		 * original removed range containing delayed blocks are
 		 * excluded.  They've already been accounted for and it's not
 		 * possible to determine if an associated pending reservation
 		 * should be released with the information available in the
 		 * extents status tree.
 		 */
 		if (first_lclu == last_lclu) {
-			if (left_delonly | right_delonly)
+			if (left_delayed | right_delayed)
 				count_pending = false;
 			else
 				count_pending = true;
 		} else {
-			if (left_delonly)
+			if (left_delayed)
 				first_lclu++;
-			if (right_delonly)
+			if (right_delayed)
 				last_lclu--;
 			if (first_lclu <= last_lclu)
 				count_pending = true;
@@ -1352,13 +1350,13 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 		/*
 		 * a pending reservation found between first_lclu and last_lclu
 		 * represents an allocated cluster that contained at least one
-		 * delonly block, so the delonly total must be reduced by one
+		 * delayed block, so the delayed total must be reduced by one
 		 * for each pending reservation found and released
 		 */
 		if (count_pending) {
 			pr = __pr_tree_search(&tree->root, first_lclu);
 			while (pr && pr->lclu <= last_lclu) {
-				rc->ndelonly_cluster--;
+				rc->ndelayed_cluster--;
 				node = rb_next(&pr->rb_node);
 				rb_erase(&pr->rb_node, &tree->root);
 				__free_pending(pr);
@@ -1369,7 +1367,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
 			}
 		}
 	}
-	return rc->ndelonly_cluster;
+	return rc->ndelayed_cluster;
 }
 
 
@@ -1403,8 +1401,8 @@ static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 	struct rsvd_count rc;
 
 	if (rinfo) {
-		rinfo->delonly_cluster = 0;
-		rinfo->delonly_block = 0;
+		rinfo->delayed_cluster = 0;
+		rinfo->delayed_block = 0;
 		if (test_opt(inode->i_sb, DELALLOC))
 			count_reserved = true;
 	}
@@ -1506,8 +1504,8 @@ static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 
 out_get_reserved:
 	if (count_reserved) {
-		rinfo->delonly_cluster = get_rsvd(inode, end, es, &rc);
-		rinfo->delonly_block = rc.ndelonly_block;
+		rinfo->delayed_cluster = get_rsvd(inode, end, es, &rc);
+		rinfo->delayed_block = rc.ndelayed_block;
 	}
 out:
 	return err;
@@ -1565,7 +1563,7 @@ void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 		goto retry;
 
 	ext4_es_print_tree(inode);
-	ext4_da_release_space(inode, rinfo.delonly_cluster);
+	ext4_da_release_space(inode, rinfo.delayed_cluster);
 	return;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/10] ext4: factor out ext4_map_create_blocks() to allocate new blocks
  2024-08-02 11:51 ` [PATCH v2 01/10] ext4: factor out ext4_map_create_blocks() to allocate new blocks Zhang Yi
@ 2024-08-06 14:38   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2024-08-06 14:38 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Fri 02-08-24 19:51:11, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Factor out a common helper ext4_map_create_blocks() from
> ext4_map_blocks() to do a real blocks allocation, no logic changes.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks OK. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 157 +++++++++++++++++++++++++-----------------------
>  1 file changed, 81 insertions(+), 76 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 941c1c0d5c6e..112aec171ee9 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -482,6 +482,86 @@ static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
>  	return retval;
>  }
>  
> +static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
> +				  struct ext4_map_blocks *map, int flags)
> +{
> +	struct extent_status es;
> +	unsigned int status;
> +	int err, retval = 0;
> +
> +	/*
> +	 * Here we clear m_flags because after allocating an new extent,
> +	 * it will be set again.
> +	 */
> +	map->m_flags &= ~EXT4_MAP_FLAGS;
> +
> +	/*
> +	 * We need to check for EXT4 here because migrate could have
> +	 * changed the inode type in between.
> +	 */
> +	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> +		retval = ext4_ext_map_blocks(handle, inode, map, flags);
> +	} else {
> +		retval = ext4_ind_map_blocks(handle, inode, map, flags);
> +
> +		/*
> +		 * We allocated new blocks which will result in i_data's
> +		 * format changing. Force the migrate to fail by clearing
> +		 * migrate flags.
> +		 */
> +		if (retval > 0 && map->m_flags & EXT4_MAP_NEW)
> +			ext4_clear_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
> +	}
> +	if (retval <= 0)
> +		return retval;
> +
> +	if (unlikely(retval != map->m_len)) {
> +		ext4_warning(inode->i_sb,
> +			     "ES len assertion failed for inode %lu: "
> +			     "retval %d != map->m_len %d",
> +			     inode->i_ino, retval, map->m_len);
> +		WARN_ON(1);
> +	}
> +
> +	/*
> +	 * We have to zeroout blocks before inserting them into extent
> +	 * status tree. Otherwise someone could look them up there and
> +	 * use them before they are really zeroed. We also have to
> +	 * unmap metadata before zeroing as otherwise writeback can
> +	 * overwrite zeros with stale data from block device.
> +	 */
> +	if (flags & EXT4_GET_BLOCKS_ZERO &&
> +	    map->m_flags & EXT4_MAP_MAPPED && map->m_flags & EXT4_MAP_NEW) {
> +		err = ext4_issue_zeroout(inode, map->m_lblk, map->m_pblk,
> +					 map->m_len);
> +		if (err)
> +			return err;
> +	}
> +
> +	/*
> +	 * If the extent has been zeroed out, we don't need to update
> +	 * extent status tree.
> +	 */
> +	if (flags & EXT4_GET_BLOCKS_PRE_IO &&
> +	    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> +		if (ext4_es_is_written(&es))
> +			return retval;
> +	}
> +
> +	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> +			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> +	if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> +	    !(status & EXTENT_STATUS_WRITTEN) &&
> +	    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
> +			       map->m_lblk + map->m_len - 1))
> +		status |= EXTENT_STATUS_DELAYED;
> +
> +	ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
> +			      map->m_pblk, status);
> +
> +	return retval;
> +}
> +
>  /*
>   * The ext4_map_blocks() function tries to look up the requested blocks,
>   * and returns if the blocks are already mapped.
> @@ -630,12 +710,6 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  		if (!(flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN))
>  			return retval;
>  
> -	/*
> -	 * Here we clear m_flags because after allocating an new extent,
> -	 * it will be set again.
> -	 */
> -	map->m_flags &= ~EXT4_MAP_FLAGS;
> -
>  	/*
>  	 * New blocks allocate and/or writing to unwritten extent
>  	 * will possibly result in updating i_data, so we take
> @@ -643,76 +717,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  	 * with create == 1 flag.
>  	 */
>  	down_write(&EXT4_I(inode)->i_data_sem);
> -
> -	/*
> -	 * We need to check for EXT4 here because migrate
> -	 * could have changed the inode type in between
> -	 */
> -	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> -		retval = ext4_ext_map_blocks(handle, inode, map, flags);
> -	} else {
> -		retval = ext4_ind_map_blocks(handle, inode, map, flags);
> -
> -		if (retval > 0 && map->m_flags & EXT4_MAP_NEW) {
> -			/*
> -			 * We allocated new blocks which will result in
> -			 * i_data's format changing.  Force the migrate
> -			 * to fail by clearing migrate flags
> -			 */
> -			ext4_clear_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
> -		}
> -	}
> -
> -	if (retval > 0) {
> -		unsigned int status;
> -
> -		if (unlikely(retval != map->m_len)) {
> -			ext4_warning(inode->i_sb,
> -				     "ES len assertion failed for inode "
> -				     "%lu: retval %d != map->m_len %d",
> -				     inode->i_ino, retval, map->m_len);
> -			WARN_ON(1);
> -		}
> -
> -		/*
> -		 * We have to zeroout blocks before inserting them into extent
> -		 * status tree. Otherwise someone could look them up there and
> -		 * use them before they are really zeroed. We also have to
> -		 * unmap metadata before zeroing as otherwise writeback can
> -		 * overwrite zeros with stale data from block device.
> -		 */
> -		if (flags & EXT4_GET_BLOCKS_ZERO &&
> -		    map->m_flags & EXT4_MAP_MAPPED &&
> -		    map->m_flags & EXT4_MAP_NEW) {
> -			ret = ext4_issue_zeroout(inode, map->m_lblk,
> -						 map->m_pblk, map->m_len);
> -			if (ret) {
> -				retval = ret;
> -				goto out_sem;
> -			}
> -		}
> -
> -		/*
> -		 * If the extent has been zeroed out, we don't need to update
> -		 * extent status tree.
> -		 */
> -		if ((flags & EXT4_GET_BLOCKS_PRE_IO) &&
> -		    ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> -			if (ext4_es_is_written(&es))
> -				goto out_sem;
> -		}
> -		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> -				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> -		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> -		    !(status & EXTENT_STATUS_WRITTEN) &&
> -		    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
> -				       map->m_lblk + map->m_len - 1))
> -			status |= EXTENT_STATUS_DELAYED;
> -		ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
> -				      map->m_pblk, status);
> -	}
> -
> -out_sem:
> +	retval = ext4_map_create_blocks(handle, inode, map, flags);
>  	up_write((&EXT4_I(inode)->i_data_sem));
>  	if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
>  		ret = check_block_validity(inode, map);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/10] ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set
  2024-08-02 11:51 ` [PATCH v2 02/10] ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set Zhang Yi
@ 2024-08-06 14:48   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2024-08-06 14:48 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Fri 02-08-24 19:51:12, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When doing block allocation, magic EXT4_GET_BLOCKS_DELALLOC_RESERVE
> means the allocating range covers a range of delayed allocated clusters,
> the blocks and quotas have already been reserved in ext4_da_map_blocks(),
> we should update the reserved space and don't need to claim them again.
> 
> At the moment, we only set this magic in mpage_map_one_extent() when
> allocating a range of delayed allocated clusters in the write back path,
> it makes things complicated since we have to notice and deal with the
> case of allocating non-delayed allocated clusters separately in
> ext4_ext_map_blocks(). For example, it we fallocate some blocks that
> have been delayed allocated, free space would be claimed again in
> ext4_mb_new_blocks() (this is wrong exactily), and we can't claim quota
> space again, we have to release the quota reservations made for that
> previously delayed allocated clusters.
> 
> Move the position thats set the EXT4_GET_BLOCKS_DELALLOC_RESERVE to
> where we actually do block allocation, it could simplify above handling
> a lot, it means that we always set this magic once the allocation range
> covers delalloc blocks, no need to take care of the allocation path.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Ah, nice idea. The patch looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 112aec171ee9..91b2610a6dc5 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -489,6 +489,14 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>  	unsigned int status;
>  	int err, retval = 0;
>  
> +	/*
> +	 * We pass in the magic EXT4_GET_BLOCKS_DELALLOC_RESERVE
> +	 * indicates that the blocks and quotas has already been
> +	 * checked when the data was copied into the page cache.
> +	 */
> +	if (map->m_flags & EXT4_MAP_DELAYED)
> +		flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;
> +
>  	/*
>  	 * Here we clear m_flags because after allocating an new extent,
>  	 * it will be set again.
> @@ -2224,11 +2232,6 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
>  	 * writeback and there is nothing we can do about it so it might result
>  	 * in data loss.  So use reserved blocks to allocate metadata if
>  	 * possible.
> -	 *
> -	 * We pass in the magic EXT4_GET_BLOCKS_DELALLOC_RESERVE if
> -	 * the blocks in question are delalloc blocks.  This indicates
> -	 * that the blocks and quotas has already been checked when
> -	 * the data was copied into the page cache.
>  	 */
>  	get_blocks_flags = EXT4_GET_BLOCKS_CREATE |
>  			   EXT4_GET_BLOCKS_METADATA_NOFAIL |
> @@ -2236,8 +2239,6 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
>  	dioread_nolock = ext4_should_dioread_nolock(inode);
>  	if (dioread_nolock)
>  		get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
> -	if (map->m_flags & BIT(BH_Delay))
> -		get_blocks_flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;
>  
>  	err = ext4_map_blocks(handle, inode, map, get_blocks_flags);
>  	if (err < 0)
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks
  2024-08-02 11:51 ` [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks Zhang Yi
@ 2024-08-06 15:23   ` Jan Kara
  2024-08-07 12:18     ` Zhang Yi
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2024-08-06 15:23 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Fri 02-08-24 19:51:13, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Since we always set EXT4_GET_BLOCKS_DELALLOC_RESERVE when allocating
> delalloc blocks, there is no need to keep delayed flag on the unwritten
> extent status entry, so just drop it after allocation.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Let me improve the changelog because I was confused for some time before I
understood:

Currently, we release delayed allocation reservation when removing delayed
extent from extent status tree (which also happens when overwriting one
extent with another one). When we allocated unwritten extent under
some delayed allocated extent, we don't need the reservation anymore and
hence we don't need to preserve the EXT4_MAP_DELAYED status bit. Inserting
the new extent into extent status tree will properly release the
reservation.

Otherwise feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 91b2610a6dc5..e9ce1e4e6acb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -558,12 +558,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>  
>  	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>  			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> -	if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> -	    !(status & EXTENT_STATUS_WRITTEN) &&
> -	    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
> -			       map->m_lblk + map->m_len - 1))
> -		status |= EXTENT_STATUS_DELAYED;
> -
>  	ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
>  			      map->m_pblk, status);
>  
> @@ -682,11 +676,6 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  
>  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> -		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> -		    !(status & EXTENT_STATUS_WRITTEN) &&
> -		    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
> -				       map->m_lblk + map->m_len - 1))
> -			status |= EXTENT_STATUS_DELAYED;
>  		ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
>  				      map->m_pblk, status);
>  	}
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks
  2024-08-06 15:23   ` Jan Kara
@ 2024-08-07 12:18     ` Zhang Yi
  2024-08-07 13:37       ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-07 12:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/8/6 23:23, Jan Kara wrote:
> On Fri 02-08-24 19:51:13, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Since we always set EXT4_GET_BLOCKS_DELALLOC_RESERVE when allocating
>> delalloc blocks, there is no need to keep delayed flag on the unwritten
>> extent status entry, so just drop it after allocation.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> Let me improve the changelog because I was confused for some time before I
> understood:
> 
> Currently, we release delayed allocation reservation when removing delayed
> extent from extent status tree (which also happens when overwriting one
> extent with another one). When we allocated unwritten extent under
> some delayed allocated extent, we don't need the reservation anymore and
> hence we don't need to preserve the EXT4_MAP_DELAYED status bit. Inserting
> the new extent into extent status tree will properly release the
> reservation.
> 

Thanks for your review and change log improvement. My original idea was very
simple, after patch 2, we always set EXT4_GET_BLOCKS_DELALLOC_RESERVE when
allocating blocks for delalloc extent, these two conditions in the 'if'
branch can never be true at the same time, so they become dead code and I
dropped them.

	if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
	    ext4_es_scan_range(inode, &ext4_es_is_delayed, ...)

But after thinking your change log, I agree with you that we have already
properly update the reservation by searching delayed blocks through
ext4_es_delayed_clu() in ext4_ext_map_blocks() when we allocated unwritten
extent under some delayed allocated extent even it's not from the write
back path, so I think we can also drop them even without patch 2. But just
one point, I think the last last sentence isn't exactly true before path 6,
should it be "Allocating the new extent blocks will properly release the
reservation." now ?

Thanks,
Yi.

> Otherwise feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> 								Honza
> 
> 
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 91b2610a6dc5..e9ce1e4e6acb 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -558,12 +558,6 @@ static int ext4_map_create_blocks(handle_t *handle, struct inode *inode,
>>  
>>  	status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>>  			EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>> -	if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
>> -	    !(status & EXTENT_STATUS_WRITTEN) &&
>> -	    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
>> -			       map->m_lblk + map->m_len - 1))
>> -		status |= EXTENT_STATUS_DELAYED;
>> -
>>  	ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
>>  			      map->m_pblk, status);
>>  
>> @@ -682,11 +676,6 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>>  
>>  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>>  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>> -		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
>> -		    !(status & EXTENT_STATUS_WRITTEN) &&
>> -		    ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
>> -				       map->m_lblk + map->m_len - 1))
>> -			status |= EXTENT_STATUS_DELAYED;
>>  		ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
>>  				      map->m_pblk, status);
>>  	}
>> -- 
>> 2.39.2
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks
  2024-08-07 12:18     ` Zhang Yi
@ 2024-08-07 13:37       ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2024-08-07 13:37 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, linux-ext4, linux-fsdevel, linux-kernel, tytso,
	adilger.kernel, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 07-08-24 20:18:18, Zhang Yi wrote:
> On 2024/8/6 23:23, Jan Kara wrote:
> > On Fri 02-08-24 19:51:13, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> Since we always set EXT4_GET_BLOCKS_DELALLOC_RESERVE when allocating
> >> delalloc blocks, there is no need to keep delayed flag on the unwritten
> >> extent status entry, so just drop it after allocation.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > 
> > Let me improve the changelog because I was confused for some time before I
> > understood:
> > 
> > Currently, we release delayed allocation reservation when removing delayed
> > extent from extent status tree (which also happens when overwriting one
> > extent with another one). When we allocated unwritten extent under
> > some delayed allocated extent, we don't need the reservation anymore and
> > hence we don't need to preserve the EXT4_MAP_DELAYED status bit. Inserting
> > the new extent into extent status tree will properly release the
> > reservation.
> > 
> 
> Thanks for your review and change log improvement. My original idea was very
> simple, after patch 2, we always set EXT4_GET_BLOCKS_DELALLOC_RESERVE when
> allocating blocks for delalloc extent, these two conditions in the 'if'
> branch can never be true at the same time, so they become dead code and I
> dropped them.
> 
> 	if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> 	    ext4_es_scan_range(inode, &ext4_es_is_delayed, ...)
> 
> But after thinking your change log, I agree with you that we have already
> properly update the reservation by searching delayed blocks through
> ext4_es_delayed_clu() in ext4_ext_map_blocks() when we allocated unwritten
> extent under some delayed allocated extent even it's not from the write
> back path, so I think we can also drop them even without patch 2. But just
> one point, I think the last last sentence isn't exactly true before path 6,
> should it be "Allocating the new extent blocks will properly release the
> reservation." now ?

Now you've got me confused again ;) Why I wrote the changelog that way is
because ext4_es_remove_extent() is calling ext4_da_release_space(). But now
I've realized I've confused ext4_es_remove_extent() with
__es_remove_extent() which is what gets called when inserting another
extent. So I was wrong and indeed your version of the last sentense is
correct. Thanks for catching this!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
  2024-08-02 11:51 ` [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent() Zhang Yi
@ 2024-08-07 17:41   ` Jan Kara
  2024-08-08 11:18     ` Zhang Yi
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2024-08-07 17:41 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Fri 02-08-24 19:51:16, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Now that we update data reserved space for delalloc after allocating
> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
> enabled, we also need to query the extents_status tree to calculate the
> exact reserved clusters. This is complicated now and it appears that
> it's better to do this job in ext4_es_insert_extent(), because
> __es_remove_extent() have already count delalloc blocks when removing
> delalloc extents and __revise_pending() return new adding pending count,
> we could update the reserved blocks easily in ext4_es_insert_extent().
> 
> Thers is one special case needs to concern is the quota claiming, when
> bigalloc is enabled, if the delayed cluster allocation has been raced
> by another no-delayed allocation(e.g. from fallocate) which doesn't
> cover the delayed blocks:
> 
>   |<       one cluster       >|
>   hhhhhhhhhhhhhhhhhhhdddddddddd
>   ^            ^
>   |<          >| < fallocate this range, don't claim quota again
> 
> We can't claim quota as usual because the fallocate has already claimed
> it in ext4_mb_new_blocks(), we could notice this case through the
> removed delalloc blocks count.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
...
> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>  			__free_pending(pr);
>  			pr = NULL;
>  		}
> +		pending = err3;
>  	}
>  error:
>  	write_unlock(&EXT4_I(inode)->i_es_lock);
> +	/*
> +	 * Reduce the reserved cluster count to reflect successful deferred
> +	 * allocation of delayed allocated clusters or direct allocation of
> +	 * clusters discovered to be delayed allocated.  Once allocated, a
> +	 * cluster is not included in the reserved count.
> +	 *
> +	 * When bigalloc is enabled, allocating non-delayed allocated blocks
> +	 * which belong to delayed allocated clusters (from fallocate, filemap,
> +	 * DIO, or clusters allocated when delalloc has been disabled by
> +	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
> +	 * so release the quota reservations made for any previously delayed
> +	 * allocated clusters.
> +	 */
> +	resv_used = rinfo.delonly_cluster + pending;
> +	if (resv_used)
> +		ext4_da_update_reserve_space(inode, resv_used,
> +					     rinfo.delonly_block);

I'm not sure I understand here. We are inserting extent into extent status
tree. We are replacing resv_used clusters worth of space with delayed
allocation reservation with normally allocated clusters so we need to
release the reservation (mballoc already reduced freeclusters counter).
That I understand. In normal case we should also claim quota because we are
converting from reserved into allocated state. Now if we allocated blocks
under this range (e.g. from fallocate()) without
EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
blocks for this extent or not.

At this point it would seem much clearer if we passed flag to
ext4_es_insert_extent() whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set
when allocating extent or not instead of computing delonly_block and
somehow infering from that. But maybe I miss some obvious reason why that
is correct.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/10] ext4: use ext4_map_query_blocks() in ext4_map_blocks()
  2024-08-02 11:51 ` [PATCH v2 08/10] ext4: use ext4_map_query_blocks() in ext4_map_blocks() Zhang Yi
@ 2024-08-07 17:43   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2024-08-07 17:43 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Fri 02-08-24 19:51:18, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> The blocks map querying logic in ext4_map_blocks() are the same as
> ext4_map_query_blocks(), so switch to directly use it.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Sure. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 22 +---------------------
>  1 file changed, 1 insertion(+), 21 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e9ce1e4e6acb..8bd65a45a26a 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -658,27 +658,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  	 * file system block.
>  	 */
>  	down_read(&EXT4_I(inode)->i_data_sem);
> -	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> -		retval = ext4_ext_map_blocks(handle, inode, map, 0);
> -	} else {
> -		retval = ext4_ind_map_blocks(handle, inode, map, 0);
> -	}
> -	if (retval > 0) {
> -		unsigned int status;
> -
> -		if (unlikely(retval != map->m_len)) {
> -			ext4_warning(inode->i_sb,
> -				     "ES len assertion failed for inode "
> -				     "%lu: retval %d != map->m_len %d",
> -				     inode->i_ino, retval, map->m_len);
> -			WARN_ON(1);
> -		}
> -
> -		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> -				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> -		ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
> -				      map->m_pblk, status);
> -	}
> +	retval = ext4_map_query_blocks(handle, inode, map);
>  	up_read((&EXT4_I(inode)->i_data_sem));
>  
>  found:
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 09/10] ext4: drop ext4_es_is_delonly()
  2024-08-02 11:51 ` [PATCH v2 09/10] ext4: drop ext4_es_is_delonly() Zhang Yi
@ 2024-08-07 17:48   ` Jan Kara
  2024-08-08 11:21     ` Zhang Yi
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2024-08-07 17:48 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Fri 02-08-24 19:51:19, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Since we don't add delayed flag in unwritten extents, so there is no
> difference between ext4_es_is_delayed() and ext4_es_is_delonly(),
> just drop ext4_es_is_delonly().
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. But please also add assertion when inserting extent into status
tree that only one of EXTENT_STATUS_WRITTEN, EXTENT_STATUS_UNWRITTEN,
EXTENT_STATUS_DELAYED, and EXTENT_STATUS_HOLE is set.
Also perhaps add comment to EXTENT_STATUS_DELAYED (and other) definition that
these states are exclusive. Thanks!

									Honza

> ---
>  fs/ext4/extents_status.c | 18 +++++++++---------
>  fs/ext4/extents_status.h |  5 -----
>  fs/ext4/inode.c          |  4 ++--
>  3 files changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> index e482ac818317..5fb0a02405ba 100644
> --- a/fs/ext4/extents_status.c
> +++ b/fs/ext4/extents_status.c
> @@ -563,8 +563,8 @@ static int ext4_es_can_be_merged(struct extent_status *es1,
>  	if (ext4_es_is_hole(es1))
>  		return 1;
>  
> -	/* we need to check delayed extent is without unwritten status */
> -	if (ext4_es_is_delayed(es1) && !ext4_es_is_unwritten(es1))
> +	/* we need to check delayed extent */
> +	if (ext4_es_is_delayed(es1))
>  		return 1;
>  
>  	return 0;
> @@ -1139,7 +1139,7 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>  	ext4_lblk_t i, end, nclu;
>  
> -	if (!ext4_es_is_delonly(es))
> +	if (!ext4_es_is_delayed(es))
>  		return;
>  
>  	WARN_ON(len <= 0);
> @@ -1291,7 +1291,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
>  		es = rc->left_es;
>  		while (es && ext4_es_end(es) >=
>  		       EXT4_LBLK_CMASK(sbi, rc->first_do_lblk)) {
> -			if (ext4_es_is_delonly(es)) {
> +			if (ext4_es_is_delayed(es)) {
>  				rc->ndelonly_cluster--;
>  				left_delonly = true;
>  				break;
> @@ -1311,7 +1311,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
>  			}
>  			while (es && es->es_lblk <=
>  			       EXT4_LBLK_CFILL(sbi, rc->last_do_lblk)) {
> -				if (ext4_es_is_delonly(es)) {
> +				if (ext4_es_is_delayed(es)) {
>  					rc->ndelonly_cluster--;
>  					right_delonly = true;
>  					break;
> @@ -2239,7 +2239,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
>  	if (EXT4_B2C(sbi, lblk) == EXT4_B2C(sbi, end)) {
>  		first = EXT4_LBLK_CMASK(sbi, lblk);
>  		if (first != lblk)
> -			f_del = __es_scan_range(inode, &ext4_es_is_delonly,
> +			f_del = __es_scan_range(inode, &ext4_es_is_delayed,
>  						first, lblk - 1);
>  		if (f_del) {
>  			ret = __insert_pending(inode, first, prealloc);
> @@ -2251,7 +2251,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
>  			       sbi->s_cluster_ratio - 1;
>  			if (last != end)
>  				l_del = __es_scan_range(inode,
> -							&ext4_es_is_delonly,
> +							&ext4_es_is_delayed,
>  							end + 1, last);
>  			if (l_del) {
>  				ret = __insert_pending(inode, last, prealloc);
> @@ -2264,7 +2264,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
>  	} else {
>  		first = EXT4_LBLK_CMASK(sbi, lblk);
>  		if (first != lblk)
> -			f_del = __es_scan_range(inode, &ext4_es_is_delonly,
> +			f_del = __es_scan_range(inode, &ext4_es_is_delayed,
>  						first, lblk - 1);
>  		if (f_del) {
>  			ret = __insert_pending(inode, first, prealloc);
> @@ -2276,7 +2276,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
>  
>  		last = EXT4_LBLK_CMASK(sbi, end) + sbi->s_cluster_ratio - 1;
>  		if (last != end)
> -			l_del = __es_scan_range(inode, &ext4_es_is_delonly,
> +			l_del = __es_scan_range(inode, &ext4_es_is_delayed,
>  						end + 1, last);
>  		if (l_del) {
>  			ret = __insert_pending(inode, last, prealloc);
> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> index 5b49cb3b9aff..e484c60e55e3 100644
> --- a/fs/ext4/extents_status.h
> +++ b/fs/ext4/extents_status.h
> @@ -184,11 +184,6 @@ static inline int ext4_es_is_mapped(struct extent_status *es)
>  	return (ext4_es_is_written(es) || ext4_es_is_unwritten(es));
>  }
>  
> -static inline int ext4_es_is_delonly(struct extent_status *es)
> -{
> -	return (ext4_es_is_delayed(es) && !ext4_es_is_unwritten(es));
> -}
> -
>  static inline void ext4_es_set_referenced(struct extent_status *es)
>  {
>  	es->es_pblk |= ((ext4_fsblk_t)EXTENT_STATUS_REFERENCED) << ES_SHIFT;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 8bd65a45a26a..2b301c165468 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1645,7 +1645,7 @@ static int ext4_clu_alloc_state(struct inode *inode, ext4_lblk_t lblk)
>  	int ret;
>  
>  	/* Has delalloc reservation? */
> -	if (ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
> +	if (ext4_es_scan_clu(inode, &ext4_es_is_delayed, lblk))
>  		return 1;
>  
>  	/* Already been allocated? */
> @@ -1766,7 +1766,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map)
>  		 * Delayed extent could be allocated by fallocate.
>  		 * So we need to check it.
>  		 */
> -		if (ext4_es_is_delonly(&es)) {
> +		if (ext4_es_is_delayed(&es)) {
>  			map->m_flags |= EXT4_MAP_DELAYED;
>  			return 0;
>  		}
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
  2024-08-07 17:41   ` Jan Kara
@ 2024-08-08 11:18     ` Zhang Yi
  2024-08-08 18:36       ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-08 11:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/8/8 1:41, Jan Kara wrote:
> On Fri 02-08-24 19:51:16, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Now that we update data reserved space for delalloc after allocating
>> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
>> enabled, we also need to query the extents_status tree to calculate the
>> exact reserved clusters. This is complicated now and it appears that
>> it's better to do this job in ext4_es_insert_extent(), because
>> __es_remove_extent() have already count delalloc blocks when removing
>> delalloc extents and __revise_pending() return new adding pending count,
>> we could update the reserved blocks easily in ext4_es_insert_extent().
>>
>> Thers is one special case needs to concern is the quota claiming, when
>> bigalloc is enabled, if the delayed cluster allocation has been raced
>> by another no-delayed allocation(e.g. from fallocate) which doesn't
>> cover the delayed blocks:
>>
>>   |<       one cluster       >|
>>   hhhhhhhhhhhhhhhhhhhdddddddddd
>>   ^            ^
>>   |<          >| < fallocate this range, don't claim quota again
>>
>> We can't claim quota as usual because the fallocate has already claimed
>> it in ext4_mb_new_blocks(), we could notice this case through the
>> removed delalloc blocks count.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ...
>> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>>  			__free_pending(pr);
>>  			pr = NULL;
>>  		}
>> +		pending = err3;
>>  	}
>>  error:
>>  	write_unlock(&EXT4_I(inode)->i_es_lock);
>> +	/*
>> +	 * Reduce the reserved cluster count to reflect successful deferred
>> +	 * allocation of delayed allocated clusters or direct allocation of
>> +	 * clusters discovered to be delayed allocated.  Once allocated, a
>> +	 * cluster is not included in the reserved count.
>> +	 *
>> +	 * When bigalloc is enabled, allocating non-delayed allocated blocks
>> +	 * which belong to delayed allocated clusters (from fallocate, filemap,
>> +	 * DIO, or clusters allocated when delalloc has been disabled by
>> +	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
>> +	 * so release the quota reservations made for any previously delayed
>> +	 * allocated clusters.
>> +	 */
>> +	resv_used = rinfo.delonly_cluster + pending;
>> +	if (resv_used)
>> +		ext4_da_update_reserve_space(inode, resv_used,
>> +					     rinfo.delonly_block);
> 
> I'm not sure I understand here. We are inserting extent into extent status
> tree. We are replacing resv_used clusters worth of space with delayed
> allocation reservation with normally allocated clusters so we need to
> release the reservation (mballoc already reduced freeclusters counter).
> That I understand. In normal case we should also claim quota because we are
> converting from reserved into allocated state. Now if we allocated blocks
> under this range (e.g. from fallocate()) without
> EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
> instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
> related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
> blocks for this extent or not.
> 

Oh, this is really complicated due to the bigalloc feature, please let me
explain it more clearly by listing all related situations.

There are 2 types of paths of allocating delayed/reserved cluster:
1. Normal case, normally allocate delayed clusters from the write back path.
2. Special case, allocate blocks under this delayed range, e.g. from
   fallocate().

There are 4 situations below:

A. bigalloc is disabled. This case is simple, after path 2, we don't need
   to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we
   set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is
   detected. If the flag is set, we must be replacing a delayed extent and
   rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal
   to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.

B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed
   cluster:
B0.Allocating a whole delayed cluster, this case is the same to A.

     |<         one cluster       >|
     ddddddd+ddddddd+ddddddd+ddddddd
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range

B1.Allocating delayed blocks in a reserved cluster, this case is the same
   to A, too.

     |<         one cluster       >|
     hhhhhhh+hhhhhhh+ddddddd+ddddddd
                             ^^^^^^^
                             allocating this range

B2.Allocating blocks which doesn't cover the delayed blocks in one reserved
   cluster,

     |<         one cluster       >|
     hhhhhhh+hhhhhhh+hhhhhhh+ddddddd
     ^^^^^^^
     fallocating this range

  This case must from path 2, which means allocating blocks without
  EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must
  be 0 since we are not replacing any delayed extents, so
  rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED
  detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is
  not set. So I think we could use rinfo.delonly_block to identify this
  case.

As the cases listed above, I think we could use rinfo.delonly_block to
determine whether the EXT4_GET_BLOCKS_DELALLOC_RESERVE is set, so I use it
to determine if we need to claim quota or release quota.

> At this point it would seem much clearer if we passed flag to
> ext4_es_insert_extent() whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set
> when allocating extent or not instead of computing delonly_block and
> somehow infering from that. But maybe I miss some obvious reason why that
> is correct.
> 

Yeah, I agree that infer from computing delonly_block is little obscure
and not clear enough, passing a flag is a clearer solution, but we have
to pass one more parameter to ext4_es_insert_extent() which could only be
set or not set in the allocating path in ext4_map_create_blocks(), other
5 callers don't care about it (so they should always have no
EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set theoretically).

I have no strong feeling of which one is better, which one do you perfer
after reading my explanation above?

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 09/10] ext4: drop ext4_es_is_delonly()
  2024-08-07 17:48   ` Jan Kara
@ 2024-08-08 11:21     ` Zhang Yi
  0 siblings, 0 replies; 25+ messages in thread
From: Zhang Yi @ 2024-08-08 11:21 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/8/8 1:48, Jan Kara wrote:
> On Fri 02-08-24 19:51:19, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Since we don't add delayed flag in unwritten extents, so there is no
>> difference between ext4_es_is_delayed() and ext4_es_is_delonly(),
>> just drop ext4_es_is_delonly().
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> Looks good. But please also add assertion when inserting extent into status
> tree that only one of EXTENT_STATUS_WRITTEN, EXTENT_STATUS_UNWRITTEN,
> EXTENT_STATUS_DELAYED, and EXTENT_STATUS_HOLE is set.
> Also perhaps add comment to EXTENT_STATUS_DELAYED (and other) definition that
> these states are exclusive. Thanks!
> 

Sure, will do. Thanks for the suggestion,

Yi.

> 
>> ---
>>  fs/ext4/extents_status.c | 18 +++++++++---------
>>  fs/ext4/extents_status.h |  5 -----
>>  fs/ext4/inode.c          |  4 ++--
>>  3 files changed, 11 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
>> index e482ac818317..5fb0a02405ba 100644
>> --- a/fs/ext4/extents_status.c
>> +++ b/fs/ext4/extents_status.c
>> @@ -563,8 +563,8 @@ static int ext4_es_can_be_merged(struct extent_status *es1,
>>  	if (ext4_es_is_hole(es1))
>>  		return 1;
>>  
>> -	/* we need to check delayed extent is without unwritten status */
>> -	if (ext4_es_is_delayed(es1) && !ext4_es_is_unwritten(es1))
>> +	/* we need to check delayed extent */
>> +	if (ext4_es_is_delayed(es1))
>>  		return 1;
>>  
>>  	return 0;
>> @@ -1139,7 +1139,7 @@ static void count_rsvd(struct inode *inode, ext4_lblk_t lblk, long len,
>>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>>  	ext4_lblk_t i, end, nclu;
>>  
>> -	if (!ext4_es_is_delonly(es))
>> +	if (!ext4_es_is_delayed(es))
>>  		return;
>>  
>>  	WARN_ON(len <= 0);
>> @@ -1291,7 +1291,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
>>  		es = rc->left_es;
>>  		while (es && ext4_es_end(es) >=
>>  		       EXT4_LBLK_CMASK(sbi, rc->first_do_lblk)) {
>> -			if (ext4_es_is_delonly(es)) {
>> +			if (ext4_es_is_delayed(es)) {
>>  				rc->ndelonly_cluster--;
>>  				left_delonly = true;
>>  				break;
>> @@ -1311,7 +1311,7 @@ static unsigned int get_rsvd(struct inode *inode, ext4_lblk_t end,
>>  			}
>>  			while (es && es->es_lblk <=
>>  			       EXT4_LBLK_CFILL(sbi, rc->last_do_lblk)) {
>> -				if (ext4_es_is_delonly(es)) {
>> +				if (ext4_es_is_delayed(es)) {
>>  					rc->ndelonly_cluster--;
>>  					right_delonly = true;
>>  					break;
>> @@ -2239,7 +2239,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
>>  	if (EXT4_B2C(sbi, lblk) == EXT4_B2C(sbi, end)) {
>>  		first = EXT4_LBLK_CMASK(sbi, lblk);
>>  		if (first != lblk)
>> -			f_del = __es_scan_range(inode, &ext4_es_is_delonly,
>> +			f_del = __es_scan_range(inode, &ext4_es_is_delayed,
>>  						first, lblk - 1);
>>  		if (f_del) {
>>  			ret = __insert_pending(inode, first, prealloc);
>> @@ -2251,7 +2251,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
>>  			       sbi->s_cluster_ratio - 1;
>>  			if (last != end)
>>  				l_del = __es_scan_range(inode,
>> -							&ext4_es_is_delonly,
>> +							&ext4_es_is_delayed,
>>  							end + 1, last);
>>  			if (l_del) {
>>  				ret = __insert_pending(inode, last, prealloc);
>> @@ -2264,7 +2264,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
>>  	} else {
>>  		first = EXT4_LBLK_CMASK(sbi, lblk);
>>  		if (first != lblk)
>> -			f_del = __es_scan_range(inode, &ext4_es_is_delonly,
>> +			f_del = __es_scan_range(inode, &ext4_es_is_delayed,
>>  						first, lblk - 1);
>>  		if (f_del) {
>>  			ret = __insert_pending(inode, first, prealloc);
>> @@ -2276,7 +2276,7 @@ static int __revise_pending(struct inode *inode, ext4_lblk_t lblk,
>>  
>>  		last = EXT4_LBLK_CMASK(sbi, end) + sbi->s_cluster_ratio - 1;
>>  		if (last != end)
>> -			l_del = __es_scan_range(inode, &ext4_es_is_delonly,
>> +			l_del = __es_scan_range(inode, &ext4_es_is_delayed,
>>  						end + 1, last);
>>  		if (l_del) {
>>  			ret = __insert_pending(inode, last, prealloc);
>> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
>> index 5b49cb3b9aff..e484c60e55e3 100644
>> --- a/fs/ext4/extents_status.h
>> +++ b/fs/ext4/extents_status.h
>> @@ -184,11 +184,6 @@ static inline int ext4_es_is_mapped(struct extent_status *es)
>>  	return (ext4_es_is_written(es) || ext4_es_is_unwritten(es));
>>  }
>>  
>> -static inline int ext4_es_is_delonly(struct extent_status *es)
>> -{
>> -	return (ext4_es_is_delayed(es) && !ext4_es_is_unwritten(es));
>> -}
>> -
>>  static inline void ext4_es_set_referenced(struct extent_status *es)
>>  {
>>  	es->es_pblk |= ((ext4_fsblk_t)EXTENT_STATUS_REFERENCED) << ES_SHIFT;
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 8bd65a45a26a..2b301c165468 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1645,7 +1645,7 @@ static int ext4_clu_alloc_state(struct inode *inode, ext4_lblk_t lblk)
>>  	int ret;
>>  
>>  	/* Has delalloc reservation? */
>> -	if (ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
>> +	if (ext4_es_scan_clu(inode, &ext4_es_is_delayed, lblk))
>>  		return 1;
>>  
>>  	/* Already been allocated? */
>> @@ -1766,7 +1766,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map)
>>  		 * Delayed extent could be allocated by fallocate.
>>  		 * So we need to check it.
>>  		 */
>> -		if (ext4_es_is_delonly(&es)) {
>> +		if (ext4_es_is_delayed(&es)) {
>>  			map->m_flags |= EXT4_MAP_DELAYED;
>>  			return 0;
>>  		}
>> -- 
>> 2.39.2
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
  2024-08-08 11:18     ` Zhang Yi
@ 2024-08-08 18:36       ` Jan Kara
  2024-08-09  3:35         ` Zhang Yi
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2024-08-08 18:36 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, linux-ext4, linux-fsdevel, linux-kernel, tytso,
	adilger.kernel, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Thu 08-08-24 19:18:30, Zhang Yi wrote:
> On 2024/8/8 1:41, Jan Kara wrote:
> > On Fri 02-08-24 19:51:16, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> Now that we update data reserved space for delalloc after allocating
> >> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
> >> enabled, we also need to query the extents_status tree to calculate the
> >> exact reserved clusters. This is complicated now and it appears that
> >> it's better to do this job in ext4_es_insert_extent(), because
> >> __es_remove_extent() have already count delalloc blocks when removing
> >> delalloc extents and __revise_pending() return new adding pending count,
> >> we could update the reserved blocks easily in ext4_es_insert_extent().
> >>
> >> Thers is one special case needs to concern is the quota claiming, when
> >> bigalloc is enabled, if the delayed cluster allocation has been raced
> >> by another no-delayed allocation(e.g. from fallocate) which doesn't
> >> cover the delayed blocks:
> >>
> >>   |<       one cluster       >|
> >>   hhhhhhhhhhhhhhhhhhhdddddddddd
> >>   ^            ^
> >>   |<          >| < fallocate this range, don't claim quota again
> >>
> >> We can't claim quota as usual because the fallocate has already claimed
> >> it in ext4_mb_new_blocks(), we could notice this case through the
> >> removed delalloc blocks count.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > ...
> >> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> >>  			__free_pending(pr);
> >>  			pr = NULL;
> >>  		}
> >> +		pending = err3;
> >>  	}
> >>  error:
> >>  	write_unlock(&EXT4_I(inode)->i_es_lock);
> >> +	/*
> >> +	 * Reduce the reserved cluster count to reflect successful deferred
> >> +	 * allocation of delayed allocated clusters or direct allocation of
> >> +	 * clusters discovered to be delayed allocated.  Once allocated, a
> >> +	 * cluster is not included in the reserved count.
> >> +	 *
> >> +	 * When bigalloc is enabled, allocating non-delayed allocated blocks
> >> +	 * which belong to delayed allocated clusters (from fallocate, filemap,
> >> +	 * DIO, or clusters allocated when delalloc has been disabled by
> >> +	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
> >> +	 * so release the quota reservations made for any previously delayed
> >> +	 * allocated clusters.
> >> +	 */
> >> +	resv_used = rinfo.delonly_cluster + pending;
> >> +	if (resv_used)
> >> +		ext4_da_update_reserve_space(inode, resv_used,
> >> +					     rinfo.delonly_block);
> > 
> > I'm not sure I understand here. We are inserting extent into extent status
> > tree. We are replacing resv_used clusters worth of space with delayed
> > allocation reservation with normally allocated clusters so we need to
> > release the reservation (mballoc already reduced freeclusters counter).
> > That I understand. In normal case we should also claim quota because we are
> > converting from reserved into allocated state. Now if we allocated blocks
> > under this range (e.g. from fallocate()) without
> > EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
> > instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
> > related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
> > blocks for this extent or not.
> 
> Oh, this is really complicated due to the bigalloc feature, please let me
> explain it more clearly by listing all related situations.
> 
> There are 2 types of paths of allocating delayed/reserved cluster:
> 1. Normal case, normally allocate delayed clusters from the write back path.
> 2. Special case, allocate blocks under this delayed range, e.g. from
>    fallocate().
> 
> There are 4 situations below:
> 
> A. bigalloc is disabled. This case is simple, after path 2, we don't need
>    to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we
>    set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is
>    detected. If the flag is set, we must be replacing a delayed extent and
>    rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal
>    to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.

Right. So fallocate() will call ext4_map_blocks() and
ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED
which you (due to patch 2 of this series) transform into
EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc
accounting through in ext4_ext_map_blocks() but this patch moved the update
to ext4_es_insert_extent(). But there is one cornercase even here AFAICT:

Suppose fallocate is called for range 0..16k, we have delalloc extent at
8k..16k. In this case ext4_map_blocks() at block 0 will not find the
delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc
without using delalloc reservation but then ext4_es_insert_extent() will
still have rinfo.delonly_block > 0 so we claim the quota reservation
instead of releasing it?

> B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed
>    cluster:
> B0.Allocating a whole delayed cluster, this case is the same to A.
> 
>      |<         one cluster       >|
>      ddddddd+ddddddd+ddddddd+ddddddd
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range

I agree. In this case there's no difference.

 
> B1.Allocating delayed blocks in a reserved cluster, this case is the same
>    to A, too.
> 
>      |<         one cluster       >|
>      hhhhhhh+hhhhhhh+ddddddd+ddddddd
>                              ^^^^^^^
>                              allocating this range

Yes, if the allocation starts within delalloc range, we will have
EXT4_GET_BLOCKS_DELALLOC_RESERVE set and ndelonly_blocks will always be >
0.

> B2.Allocating blocks which doesn't cover the delayed blocks in one reserved
>    cluster,
> 
>      |<         one cluster       >|
>      hhhhhhh+hhhhhhh+hhhhhhh+ddddddd
>      ^^^^^^^
>      fallocating this range
> 
>   This case must from path 2, which means allocating blocks without
>   EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must
>   be 0 since we are not replacing any delayed extents, so
>   rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED
>   detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is
>   not set. So I think we could use rinfo.delonly_block to identify this
>   case.

Well, this is similar to the non-bigalloc case I was asking about above.
Why the allocated unwritten extent cannot extend past the start of delalloc
extent? I didn't find anything that would disallow that...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
  2024-08-08 18:36       ` Jan Kara
@ 2024-08-09  3:35         ` Zhang Yi
  2024-08-09 16:20           ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Zhang Yi @ 2024-08-09  3:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/8/9 2:36, Jan Kara wrote:
> On Thu 08-08-24 19:18:30, Zhang Yi wrote:
>> On 2024/8/8 1:41, Jan Kara wrote:
>>> On Fri 02-08-24 19:51:16, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> Now that we update data reserved space for delalloc after allocating
>>>> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
>>>> enabled, we also need to query the extents_status tree to calculate the
>>>> exact reserved clusters. This is complicated now and it appears that
>>>> it's better to do this job in ext4_es_insert_extent(), because
>>>> __es_remove_extent() have already count delalloc blocks when removing
>>>> delalloc extents and __revise_pending() return new adding pending count,
>>>> we could update the reserved blocks easily in ext4_es_insert_extent().
>>>>
>>>> Thers is one special case needs to concern is the quota claiming, when
>>>> bigalloc is enabled, if the delayed cluster allocation has been raced
>>>> by another no-delayed allocation(e.g. from fallocate) which doesn't
>>>> cover the delayed blocks:
>>>>
>>>>   |<       one cluster       >|
>>>>   hhhhhhhhhhhhhhhhhhhdddddddddd
>>>>   ^            ^
>>>>   |<          >| < fallocate this range, don't claim quota again
>>>>
>>>> We can't claim quota as usual because the fallocate has already claimed
>>>> it in ext4_mb_new_blocks(), we could notice this case through the
>>>> removed delalloc blocks count.
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>> ...
>>>> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>>>>  			__free_pending(pr);
>>>>  			pr = NULL;
>>>>  		}
>>>> +		pending = err3;
>>>>  	}
>>>>  error:
>>>>  	write_unlock(&EXT4_I(inode)->i_es_lock);
>>>> +	/*
>>>> +	 * Reduce the reserved cluster count to reflect successful deferred
>>>> +	 * allocation of delayed allocated clusters or direct allocation of
>>>> +	 * clusters discovered to be delayed allocated.  Once allocated, a
>>>> +	 * cluster is not included in the reserved count.
>>>> +	 *
>>>> +	 * When bigalloc is enabled, allocating non-delayed allocated blocks
>>>> +	 * which belong to delayed allocated clusters (from fallocate, filemap,
>>>> +	 * DIO, or clusters allocated when delalloc has been disabled by
>>>> +	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
>>>> +	 * so release the quota reservations made for any previously delayed
>>>> +	 * allocated clusters.
>>>> +	 */
>>>> +	resv_used = rinfo.delonly_cluster + pending;
>>>> +	if (resv_used)
>>>> +		ext4_da_update_reserve_space(inode, resv_used,
>>>> +					     rinfo.delonly_block);
>>>
>>> I'm not sure I understand here. We are inserting extent into extent status
>>> tree. We are replacing resv_used clusters worth of space with delayed
>>> allocation reservation with normally allocated clusters so we need to
>>> release the reservation (mballoc already reduced freeclusters counter).
>>> That I understand. In normal case we should also claim quota because we are
>>> converting from reserved into allocated state. Now if we allocated blocks
>>> under this range (e.g. from fallocate()) without
>>> EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
>>> instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
>>> related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
>>> blocks for this extent or not.
>>
>> Oh, this is really complicated due to the bigalloc feature, please let me
>> explain it more clearly by listing all related situations.
>>
>> There are 2 types of paths of allocating delayed/reserved cluster:
>> 1. Normal case, normally allocate delayed clusters from the write back path.
>> 2. Special case, allocate blocks under this delayed range, e.g. from
>>    fallocate().
>>
>> There are 4 situations below:
>>
>> A. bigalloc is disabled. This case is simple, after path 2, we don't need
>>    to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we
>>    set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is
>>    detected. If the flag is set, we must be replacing a delayed extent and
>>    rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal
>>    to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.
> 
> Right. So fallocate() will call ext4_map_blocks() and
> ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED
> which you (due to patch 2 of this series) transform into
> EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc
> accounting through in ext4_ext_map_blocks() but this patch moved the update
> to ext4_es_insert_extent(). But there is one cornercase even here AFAICT:
> 
> Suppose fallocate is called for range 0..16k, we have delalloc extent at
> 8k..16k. In this case ext4_map_blocks() at block 0 will not find the
> delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc
> without using delalloc reservation but then ext4_es_insert_extent() will
> still have rinfo.delonly_block > 0 so we claim the quota reservation
> instead of releasing it?
> 

After commit 6430dea07e85 ("ext4: correct the hole length returned by
ext4_map_blocks()"), the fallocate range 0-16K would be divided into two
rounds. When we first calling ext4_map_blocks() with 0-16K, the map range
will be corrected to 0-8k by ext4_ext_determine_insert_hole() and the
allocating range should not cover any delayed range. Then
ext4_alloc_file_blocks() will call ext4_map_blocks() again to allocate
8K-16K in the second round, in this round, we are allocating a real
delayed range. Please below graph for details,

ext4_alloc_file_blocks() //0-16K
 ext4_map_blocks()  //0-16K
  ext4_es_lookup_extent() //find nothing
   ext4_ext_map_blocks(0)
    ext4_ext_determine_insert_hole() //change map range to 0-8K
   ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE) //allocate blocks under hole
 ext4_map_blocks()  //8-16K
  ext4_es_lookup_extent() //find delayed extent
  ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE)
    //allocate blocks under a whole delayed range,
    //use rinfo.delonly_block > 0 is okay

Hence the allocating range can't mixed with delayed and non-delayed extent
at a time, and the rinfo.delonly_block > 0 should work.

>> B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed
>>    cluster:
>> B0.Allocating a whole delayed cluster, this case is the same to A.
>>
>>      |<         one cluster       >|
>>      ddddddd+ddddddd+ddddddd+ddddddd
>>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range
> 
> I agree. In this case there's no difference.
> 
>  
>> B1.Allocating delayed blocks in a reserved cluster, this case is the same
>>    to A, too.
>>
>>      |<         one cluster       >|
>>      hhhhhhh+hhhhhhh+ddddddd+ddddddd
>>                              ^^^^^^^
>>                              allocating this range
> 
> Yes, if the allocation starts within delalloc range, we will have
> EXT4_GET_BLOCKS_DELALLOC_RESERVE set and ndelonly_blocks will always be >
> 0.
> 
>> B2.Allocating blocks which doesn't cover the delayed blocks in one reserved
>>    cluster,
>>
>>      |<         one cluster       >|
>>      hhhhhhh+hhhhhhh+hhhhhhh+ddddddd
>>      ^^^^^^^
>>      fallocating this range
>>
>>   This case must from path 2, which means allocating blocks without
>>   EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must
>>   be 0 since we are not replacing any delayed extents, so
>>   rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED
>>   detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is
>>   not set. So I think we could use rinfo.delonly_block to identify this
>>   case.
> 
> Well, this is similar to the non-bigalloc case I was asking about above.
> Why the allocated unwritten extent cannot extend past the start of delalloc
> extent? I didn't find anything that would disallow that...
> 

The same to above, ext4_ext_determine_insert_hole() should work fine for
this case.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
  2024-08-09  3:35         ` Zhang Yi
@ 2024-08-09 16:20           ` Jan Kara
  2024-08-10  4:01             ` Zhang Yi
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2024-08-09 16:20 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, linux-ext4, linux-fsdevel, linux-kernel, tytso,
	adilger.kernel, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Fri 09-08-24 11:35:49, Zhang Yi wrote:
> On 2024/8/9 2:36, Jan Kara wrote:
> > On Thu 08-08-24 19:18:30, Zhang Yi wrote:
> >> On 2024/8/8 1:41, Jan Kara wrote:
> >>> On Fri 02-08-24 19:51:16, Zhang Yi wrote:
> >>>> From: Zhang Yi <yi.zhang@huawei.com>
> >>>>
> >>>> Now that we update data reserved space for delalloc after allocating
> >>>> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
> >>>> enabled, we also need to query the extents_status tree to calculate the
> >>>> exact reserved clusters. This is complicated now and it appears that
> >>>> it's better to do this job in ext4_es_insert_extent(), because
> >>>> __es_remove_extent() have already count delalloc blocks when removing
> >>>> delalloc extents and __revise_pending() return new adding pending count,
> >>>> we could update the reserved blocks easily in ext4_es_insert_extent().
> >>>>
> >>>> Thers is one special case needs to concern is the quota claiming, when
> >>>> bigalloc is enabled, if the delayed cluster allocation has been raced
> >>>> by another no-delayed allocation(e.g. from fallocate) which doesn't
> >>>> cover the delayed blocks:
> >>>>
> >>>>   |<       one cluster       >|
> >>>>   hhhhhhhhhhhhhhhhhhhdddddddddd
> >>>>   ^            ^
> >>>>   |<          >| < fallocate this range, don't claim quota again
> >>>>
> >>>> We can't claim quota as usual because the fallocate has already claimed
> >>>> it in ext4_mb_new_blocks(), we could notice this case through the
> >>>> removed delalloc blocks count.
> >>>>
> >>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >>> ...
> >>>> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> >>>>  			__free_pending(pr);
> >>>>  			pr = NULL;
> >>>>  		}
> >>>> +		pending = err3;
> >>>>  	}
> >>>>  error:
> >>>>  	write_unlock(&EXT4_I(inode)->i_es_lock);
> >>>> +	/*
> >>>> +	 * Reduce the reserved cluster count to reflect successful deferred
> >>>> +	 * allocation of delayed allocated clusters or direct allocation of
> >>>> +	 * clusters discovered to be delayed allocated.  Once allocated, a
> >>>> +	 * cluster is not included in the reserved count.
> >>>> +	 *
> >>>> +	 * When bigalloc is enabled, allocating non-delayed allocated blocks
> >>>> +	 * which belong to delayed allocated clusters (from fallocate, filemap,
> >>>> +	 * DIO, or clusters allocated when delalloc has been disabled by
> >>>> +	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
> >>>> +	 * so release the quota reservations made for any previously delayed
> >>>> +	 * allocated clusters.
> >>>> +	 */
> >>>> +	resv_used = rinfo.delonly_cluster + pending;
> >>>> +	if (resv_used)
> >>>> +		ext4_da_update_reserve_space(inode, resv_used,
> >>>> +					     rinfo.delonly_block);
> >>>
> >>> I'm not sure I understand here. We are inserting extent into extent status
> >>> tree. We are replacing resv_used clusters worth of space with delayed
> >>> allocation reservation with normally allocated clusters so we need to
> >>> release the reservation (mballoc already reduced freeclusters counter).
> >>> That I understand. In normal case we should also claim quota because we are
> >>> converting from reserved into allocated state. Now if we allocated blocks
> >>> under this range (e.g. from fallocate()) without
> >>> EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
> >>> instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
> >>> related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
> >>> blocks for this extent or not.
> >>
> >> Oh, this is really complicated due to the bigalloc feature, please let me
> >> explain it more clearly by listing all related situations.
> >>
> >> There are 2 types of paths of allocating delayed/reserved cluster:
> >> 1. Normal case, normally allocate delayed clusters from the write back path.
> >> 2. Special case, allocate blocks under this delayed range, e.g. from
> >>    fallocate().
> >>
> >> There are 4 situations below:
> >>
> >> A. bigalloc is disabled. This case is simple, after path 2, we don't need
> >>    to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we
> >>    set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is
> >>    detected. If the flag is set, we must be replacing a delayed extent and
> >>    rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal
> >>    to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.
> > 
> > Right. So fallocate() will call ext4_map_blocks() and
> > ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED
> > which you (due to patch 2 of this series) transform into
> > EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc
> > accounting through in ext4_ext_map_blocks() but this patch moved the update
> > to ext4_es_insert_extent(). But there is one cornercase even here AFAICT:
> > 
> > Suppose fallocate is called for range 0..16k, we have delalloc extent at
> > 8k..16k. In this case ext4_map_blocks() at block 0 will not find the
> > delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc
> > without using delalloc reservation but then ext4_es_insert_extent() will
> > still have rinfo.delonly_block > 0 so we claim the quota reservation
> > instead of releasing it?
> > 
> 
> After commit 6430dea07e85 ("ext4: correct the hole length returned by
> ext4_map_blocks()"), the fallocate range 0-16K would be divided into two
> rounds. When we first calling ext4_map_blocks() with 0-16K, the map range
> will be corrected to 0-8k by ext4_ext_determine_insert_hole() and the
> allocating range should not cover any delayed range.

Eww, subtle, subtle, subtle... And isn't this also racy? We drop i_data_sem
in ext4_map_blocks() after we do the initial lookup. So there can be some
changes to both the extent tree and extent status tree before we grab
i_data_sem again for the allocation. We hold inode_lock so there can be
only writeback and page faults racing with us but e.g. ext4_page_mkwrite()
-> block_page_mkwrite -> ext4_da_get_block_prep() -> ext4_da_map_blocks()
can add delayed extent into extent status tree in that window causing
breakage, can't it?

> Then
> ext4_alloc_file_blocks() will call ext4_map_blocks() again to allocate
> 8K-16K in the second round, in this round, we are allocating a real
> delayed range. Please below graph for details,
> 
> ext4_alloc_file_blocks() //0-16K
>  ext4_map_blocks()  //0-16K
>   ext4_es_lookup_extent() //find nothing
>    ext4_ext_map_blocks(0)
>     ext4_ext_determine_insert_hole() //change map range to 0-8K
>    ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE) //allocate blocks under hole
>  ext4_map_blocks()  //8-16K
>   ext4_es_lookup_extent() //find delayed extent
>   ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE)
>     //allocate blocks under a whole delayed range,
>     //use rinfo.delonly_block > 0 is okay
> 
> Hence the allocating range can't mixed with delayed and non-delayed extent
> at a time, and the rinfo.delonly_block > 0 should work.

Besides the race above I agree. So either we need to trim mapping extent in
ext4_map_blocks() after re-acquiring i_data_sem or we need to deal with
unwritten extents that are partially delalloc. I'm more and more leaning
towards just passing the information whether delalloc was used or not to
extent status tree insertion. Because that can deal with partial extents
just fine...

Thanks for your patience with me :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
  2024-08-09 16:20           ` Jan Kara
@ 2024-08-10  4:01             ` Zhang Yi
  0 siblings, 0 replies; 25+ messages in thread
From: Zhang Yi @ 2024-08-10  4:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/8/10 0:20, Jan Kara wrote:
> On Fri 09-08-24 11:35:49, Zhang Yi wrote:
>> On 2024/8/9 2:36, Jan Kara wrote:
>>> On Thu 08-08-24 19:18:30, Zhang Yi wrote:
>>>> On 2024/8/8 1:41, Jan Kara wrote:
>>>>> On Fri 02-08-24 19:51:16, Zhang Yi wrote:
>>>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>>>
>>>>>> Now that we update data reserved space for delalloc after allocating
>>>>>> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
>>>>>> enabled, we also need to query the extents_status tree to calculate the
>>>>>> exact reserved clusters. This is complicated now and it appears that
>>>>>> it's better to do this job in ext4_es_insert_extent(), because
>>>>>> __es_remove_extent() have already count delalloc blocks when removing
>>>>>> delalloc extents and __revise_pending() return new adding pending count,
>>>>>> we could update the reserved blocks easily in ext4_es_insert_extent().
>>>>>>
>>>>>> Thers is one special case needs to concern is the quota claiming, when
>>>>>> bigalloc is enabled, if the delayed cluster allocation has been raced
>>>>>> by another no-delayed allocation(e.g. from fallocate) which doesn't
>>>>>> cover the delayed blocks:
>>>>>>
>>>>>>   |<       one cluster       >|
>>>>>>   hhhhhhhhhhhhhhhhhhhdddddddddd
>>>>>>   ^            ^
>>>>>>   |<          >| < fallocate this range, don't claim quota again
>>>>>>
>>>>>> We can't claim quota as usual because the fallocate has already claimed
>>>>>> it in ext4_mb_new_blocks(), we could notice this case through the
>>>>>> removed delalloc blocks count.
>>>>>>
>>>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>>> ...
>>>>>> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>>>>>>  			__free_pending(pr);
>>>>>>  			pr = NULL;
>>>>>>  		}
>>>>>> +		pending = err3;
>>>>>>  	}
>>>>>>  error:
>>>>>>  	write_unlock(&EXT4_I(inode)->i_es_lock);
>>>>>> +	/*
>>>>>> +	 * Reduce the reserved cluster count to reflect successful deferred
>>>>>> +	 * allocation of delayed allocated clusters or direct allocation of
>>>>>> +	 * clusters discovered to be delayed allocated.  Once allocated, a
>>>>>> +	 * cluster is not included in the reserved count.
>>>>>> +	 *
>>>>>> +	 * When bigalloc is enabled, allocating non-delayed allocated blocks
>>>>>> +	 * which belong to delayed allocated clusters (from fallocate, filemap,
>>>>>> +	 * DIO, or clusters allocated when delalloc has been disabled by
>>>>>> +	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
>>>>>> +	 * so release the quota reservations made for any previously delayed
>>>>>> +	 * allocated clusters.
>>>>>> +	 */
>>>>>> +	resv_used = rinfo.delonly_cluster + pending;
>>>>>> +	if (resv_used)
>>>>>> +		ext4_da_update_reserve_space(inode, resv_used,
>>>>>> +					     rinfo.delonly_block);
>>>>>
>>>>> I'm not sure I understand here. We are inserting extent into extent status
>>>>> tree. We are replacing resv_used clusters worth of space with delayed
>>>>> allocation reservation with normally allocated clusters so we need to
>>>>> release the reservation (mballoc already reduced freeclusters counter).
>>>>> That I understand. In normal case we should also claim quota because we are
>>>>> converting from reserved into allocated state. Now if we allocated blocks
>>>>> under this range (e.g. from fallocate()) without
>>>>> EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
>>>>> instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
>>>>> related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
>>>>> blocks for this extent or not.
>>>>
>>>> Oh, this is really complicated due to the bigalloc feature, please let me
>>>> explain it more clearly by listing all related situations.
>>>>
>>>> There are 2 types of paths of allocating delayed/reserved cluster:
>>>> 1. Normal case, normally allocate delayed clusters from the write back path.
>>>> 2. Special case, allocate blocks under this delayed range, e.g. from
>>>>    fallocate().
>>>>
>>>> There are 4 situations below:
>>>>
>>>> A. bigalloc is disabled. This case is simple, after path 2, we don't need
>>>>    to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we
>>>>    set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is
>>>>    detected. If the flag is set, we must be replacing a delayed extent and
>>>>    rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal
>>>>    to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.
>>>
>>> Right. So fallocate() will call ext4_map_blocks() and
>>> ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED
>>> which you (due to patch 2 of this series) transform into
>>> EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc
>>> accounting through in ext4_ext_map_blocks() but this patch moved the update
>>> to ext4_es_insert_extent(). But there is one cornercase even here AFAICT:
>>>
>>> Suppose fallocate is called for range 0..16k, we have delalloc extent at
>>> 8k..16k. In this case ext4_map_blocks() at block 0 will not find the
>>> delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc
>>> without using delalloc reservation but then ext4_es_insert_extent() will
>>> still have rinfo.delonly_block > 0 so we claim the quota reservation
>>> instead of releasing it?
>>>
>>
>> After commit 6430dea07e85 ("ext4: correct the hole length returned by
>> ext4_map_blocks()"), the fallocate range 0-16K would be divided into two
>> rounds. When we first calling ext4_map_blocks() with 0-16K, the map range
>> will be corrected to 0-8k by ext4_ext_determine_insert_hole() and the
>> allocating range should not cover any delayed range.
> 
> Eww, subtle, subtle, subtle... And isn't this also racy? We drop i_data_sem
> in ext4_map_blocks() after we do the initial lookup. So there can be some
> changes to both the extent tree and extent status tree before we grab
> i_data_sem again for the allocation. We hold inode_lock so there can be
> only writeback and page faults racing with us but e.g. ext4_page_mkwrite()
> -> block_page_mkwrite -> ext4_da_get_block_prep() -> ext4_da_map_blocks()
> can add delayed extent into extent status tree in that window causing
> breakage, can't it?

Oh! you are totally right, I missed that current ext4_fallocate() doesn't
hold invalidate_lock for the normal fallocate path, hence there's nothing
could prevent this race now, thanks a lot for pointing this out.

> 
>> Then
>> ext4_alloc_file_blocks() will call ext4_map_blocks() again to allocate
>> 8K-16K in the second round, in this round, we are allocating a real
>> delayed range. Please below graph for details,
>>
>> ext4_alloc_file_blocks() //0-16K
>>  ext4_map_blocks()  //0-16K
>>   ext4_es_lookup_extent() //find nothing
>>    ext4_ext_map_blocks(0)
>>     ext4_ext_determine_insert_hole() //change map range to 0-8K
>>    ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE) //allocate blocks under hole
>>  ext4_map_blocks()  //8-16K
>>   ext4_es_lookup_extent() //find delayed extent
>>   ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE)
>>     //allocate blocks under a whole delayed range,
>>     //use rinfo.delonly_block > 0 is okay
>>
>> Hence the allocating range can't mixed with delayed and non-delayed extent
>> at a time, and the rinfo.delonly_block > 0 should work.
> 
> Besides the race above I agree. So either we need to trim mapping extent in
> ext4_map_blocks() after re-acquiring i_data_sem

Yeah, if we keep on using this solution, it looks like we have to add similar
logic we've done in ext4_da_map_blocks() a few months ago into the begin of
the new helper ext4_map_create_blocks(). I guess it may expensive and not
worth now.

	if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
		map->m_len = min_t(unsigned int, map->m_len,
				   es.es_len - (map->m_lblk - es.es_lblk));
	} else
		retval = ext4_map_query_blocks(NULL, inode, map);
		...
	}

> or we need to deal with
> unwritten extents that are partially delalloc. I'm more and more leaning
> towards just passing the information whether delalloc was used or not to
> extent status tree insertion. Because that can deal with partial extents
> just fine...
> 

Yeah, I agree with you, passing the information to ext4_es_init_extent()
is simple and looks fine. I will change to use this solution.

> Thanks for your patience with me :).
> 

Anytime! I appreciate your review and suggestions as well. :)

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-08-10  4:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
2024-08-02 11:51 ` [PATCH v2 01/10] ext4: factor out ext4_map_create_blocks() to allocate new blocks Zhang Yi
2024-08-06 14:38   ` Jan Kara
2024-08-02 11:51 ` [PATCH v2 02/10] ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set Zhang Yi
2024-08-06 14:48   ` Jan Kara
2024-08-02 11:51 ` [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks Zhang Yi
2024-08-06 15:23   ` Jan Kara
2024-08-07 12:18     ` Zhang Yi
2024-08-07 13:37       ` Jan Kara
2024-08-02 11:51 ` [PATCH v2 04/10] ext4: let __revise_pending() return newly inserted pendings Zhang Yi
2024-08-02 11:51 ` [PATCH v2 05/10] ext4: count removed reserved blocks for delalloc only extent entry Zhang Yi
2024-08-02 11:51 ` [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent() Zhang Yi
2024-08-07 17:41   ` Jan Kara
2024-08-08 11:18     ` Zhang Yi
2024-08-08 18:36       ` Jan Kara
2024-08-09  3:35         ` Zhang Yi
2024-08-09 16:20           ` Jan Kara
2024-08-10  4:01             ` Zhang Yi
2024-08-02 11:51 ` [PATCH v2 07/10] ext4: drop ext4_es_delayed_clu() Zhang Yi
2024-08-02 11:51 ` [PATCH v2 08/10] ext4: use ext4_map_query_blocks() in ext4_map_blocks() Zhang Yi
2024-08-07 17:43   ` Jan Kara
2024-08-02 11:51 ` [PATCH v2 09/10] ext4: drop ext4_es_is_delonly() Zhang Yi
2024-08-07 17:48   ` Jan Kara
2024-08-08 11:21     ` Zhang Yi
2024-08-02 11:51 ` [PATCH v2 10/10] ext4: drop all delonly descriptions Zhang Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).