From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
yi.zhang@huawei.com, yi.zhang@huaweicloud.com,
chengzhihao1@huawei.com, yukuai3@huawei.com
Subject: [RFC PATCH 06/16] ext4: move delalloc data reserve spcae updating into ext4_es_insert_extent()
Date: Thu, 24 Aug 2023 17:26:09 +0800 [thread overview]
Message-ID: <20230824092619.1327976-7-yi.zhang@huaweicloud.com> (raw)
In-Reply-To: <20230824092619.1327976-1-yi.zhang@huaweicloud.com>
From: Zhang Yi <yi.zhang@huawei.com>
We update data reserved space for delalloc after allocating new blocks
in ext4_{ind|ext}_map_blocks(). If bigalloc feature is enabled, we also
need to query the extents_status tree to calculate the exact reserved
clusters. If we move it to ext4_es_insert_extent(), just after dropping
delalloc extents_status entry, it could become more simple because
__es_remove_extent() has done most of the work and we could remove
entire ext4_es_delayed_clu().
One important thing needs to take care of is that if bigalloc is
enabled, we should update data reserved count when first converting some
of the delayed only es entries of a caluster which has many other
delayed only entries left over.
| one cluster |
--------------------------------------------------------
| da es 0 | .. | da es 1 | .. | da es 2 | .. | da es 3 |
--------------------------------------------------------
^ ^
| | <- first allocating this delayed extent
The later allocations in that cluster will not count again. We could do
this by counting the new inserts pending clusters.
Another important thing is the quota claiming and i_blocks count, if
the delayed allocating has been raced by another no-delay allocating
(from fallocate, filemap, DIO...), we cannot claim quota as usual
because the racer have already done it. We could distinguish this case
easily through checking EXTENT_STATUS_DELAYED and the reserved only
blocks counted by __es_remove_extent(). If the EXTENT_STATUS_DELAYED is
set, it always means that the allocating is not from the delayed
allocating. But on the contrary, we can only get the opposite
conclusion if bigalloc is not enabled. If bigalloc is enabled, it could
be raced by another fallocate which is writing to other non-delayed
areas of the same cluster. In this case, the EXTENT_STATUS_DELAYED is
not set but we cannot claim quota again.
| one cluster |
-------------------------------------------
| | delayed es |
-------------------------------------------
^ ^
| fallocate |
So we also need to check the counted reserved only blocks, if it is zero
it means that the allocating is not from the delayed allocating, and we
should release reserved qutoa instead of claim it.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/extents.c | 37 -------------
fs/ext4/extents_status.c | 115 +++++++++------------------------------
fs/ext4/extents_status.h | 2 -
fs/ext4/indirect.c | 7 ---
fs/ext4/inode.c | 5 +-
5 files changed, 30 insertions(+), 136 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e4115d338f10..592383effe80 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4323,43 +4323,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
goto out;
}
- /*
- * Reduce the reserved cluster count to reflect successful deferred
- * allocation of delayed allocated clusters or direct allocation of
- * clusters discovered to be delayed allocated. Once allocated, a
- * cluster is not included in the reserved count.
- */
- if (test_opt(inode->i_sb, DELALLOC) && allocated_clusters) {
- if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
- /*
- * When allocating delayed allocated clusters, simply
- * reduce the reserved cluster count and claim quota
- */
- ext4_da_update_reserve_space(inode, allocated_clusters,
- 1);
- } else {
- ext4_lblk_t lblk, len;
- unsigned int n;
-
- /*
- * When allocating non-delayed allocated clusters
- * (from fallocate, filemap, DIO, or clusters
- * allocated when delalloc has been disabled by
- * ext4_nonda_switch), reduce the reserved cluster
- * count by the number of allocated clusters that
- * have previously been delayed allocated. Quota
- * has been claimed by ext4_mb_new_blocks() above,
- * so release the quota reservations made for any
- * previously delayed allocated clusters.
- */
- lblk = EXT4_LBLK_CMASK(sbi, map->m_lblk);
- len = allocated_clusters << sbi->s_cluster_bits;
- n = ext4_es_delayed_clu(inode, lblk, len);
- if (n > 0)
- ext4_da_update_reserve_space(inode, (int) n, 0);
- }
- }
-
/*
* Cache the extent and update transaction to commit on fdatasync only
* when it is _not_ an unwritten extent.
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 62191c772b82..34164c2827f2 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -856,11 +856,14 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
struct extent_status newes;
ext4_lblk_t end = lblk + len - 1;
int err1 = 0, err2 = 0, err3 = 0;
+ struct rsvd_info rinfo;
+ int pending = 0;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct extent_status *es1 = NULL;
struct extent_status *es2 = NULL;
struct pending_reservation *pr = NULL;
bool revise_pending = false;
+ bool delayed = false;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return;
@@ -878,6 +881,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
* data lose, and the extent has been written, it's safe to remove
* the delayed flag even it's still delayed.
*/
+ delayed = status & EXTENT_STATUS_DELAYED;
if ((status & EXTENT_STATUS_DELAYED) &&
(status & EXTENT_STATUS_WRITTEN))
status &= ~EXTENT_STATUS_DELAYED;
@@ -902,7 +906,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
pr = __alloc_pending(true);
write_lock(&EXT4_I(inode)->i_es_lock);
- err1 = __es_remove_extent(inode, lblk, end, NULL, es1);
+ err1 = __es_remove_extent(inode, lblk, end, &rinfo, es1);
if (err1 != 0)
goto error;
/* Free preallocated extent if it didn't get used. */
@@ -932,9 +936,30 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
__free_pending(pr);
pr = NULL;
}
+ /*
+ * In the first partial allocating some delayed extents of
+ * one cluster, we also need to count the data cluster when
+ * allocating delay only extent entries.
+ */
+ pending = err3;
}
error:
write_unlock(&EXT4_I(inode)->i_es_lock);
+ /*
+ * If EXTENT_STATUS_DELAYED is not set and delayed only blocks is
+ * not zero, we are allocating delayed allocated clusters, simply
+ * reduce the reserved cluster count and claim quota.
+ *
+ * Otherwise, we aren't allocating delayed allocated clusters
+ * (from fallocate, filemap, DIO, or clusters allocated when
+ * delalloc has been disabled by ext4_nonda_switch()), reduce the
+ * reserved cluster count by the number of allocated clusters that
+ * have previously been delayed allocated. Quota has been claimed
+ * by ext4_mb_new_blocks(), so release the quota reservations made
+ * for any previously delayed allocated clusters.
+ */
+ ext4_da_update_reserve_space(inode, rinfo.ndelonly_clu + pending,
+ !delayed && rinfo.ndelonly_blk);
if (err1 || err2 || err3 < 0)
goto retry;
@@ -2146,94 +2171,6 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
return;
}
-/*
- * __es_delayed_clu - count number of clusters containing blocks that
- * are delayed only
- *
- * @inode - file containing block range
- * @start - logical block defining start of range
- * @end - logical block defining end of range
- *
- * Returns the number of clusters containing only delayed (not delayed
- * and unwritten) blocks in the range specified by @start and @end. Any
- * cluster or part of a cluster within the range and containing a delayed
- * and not unwritten block within the range is counted as a whole cluster.
- */
-static unsigned int __es_delayed_clu(struct inode *inode, ext4_lblk_t start,
- ext4_lblk_t end)
-{
- struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
- struct extent_status *es;
- struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
- struct rb_node *node;
- ext4_lblk_t first_lclu, last_lclu;
- unsigned long long last_counted_lclu;
- unsigned int n = 0;
-
- /* guaranteed to be unequal to any ext4_lblk_t value */
- last_counted_lclu = ~0ULL;
-
- es = __es_tree_search(&tree->root, start);
-
- while (es && (es->es_lblk <= end)) {
- if (ext4_es_is_delonly(es)) {
- if (es->es_lblk <= start)
- first_lclu = EXT4_B2C(sbi, start);
- else
- first_lclu = EXT4_B2C(sbi, es->es_lblk);
-
- if (ext4_es_end(es) >= end)
- last_lclu = EXT4_B2C(sbi, end);
- else
- last_lclu = EXT4_B2C(sbi, ext4_es_end(es));
-
- if (first_lclu == last_counted_lclu)
- n += last_lclu - first_lclu;
- else
- n += last_lclu - first_lclu + 1;
- last_counted_lclu = last_lclu;
- }
- node = rb_next(&es->rb_node);
- if (!node)
- break;
- es = rb_entry(node, struct extent_status, rb_node);
- }
-
- return n;
-}
-
-/*
- * ext4_es_delayed_clu - count number of clusters containing blocks that
- * are both delayed and unwritten
- *
- * @inode - file containing block range
- * @lblk - logical block defining start of range
- * @len - number of blocks in range
- *
- * Locking for external use of __es_delayed_clu().
- */
-unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
- ext4_lblk_t len)
-{
- struct ext4_inode_info *ei = EXT4_I(inode);
- ext4_lblk_t end;
- unsigned int n;
-
- if (len == 0)
- return 0;
-
- end = lblk + len - 1;
- WARN_ON(end < lblk);
-
- read_lock(&ei->i_es_lock);
-
- n = __es_delayed_clu(inode, lblk, end);
-
- read_unlock(&ei->i_es_lock);
-
- return n;
-}
-
/*
* __revise_pending - makes, cancels, or leaves unchanged pending cluster
* reservations for a specified block range depending
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index d9847a4a25db..7344667eb2cd 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -251,8 +251,6 @@ extern void ext4_remove_pending(struct inode *inode, ext4_lblk_t lblk);
extern bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk);
extern void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
bool allocated);
-extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
- ext4_lblk_t len);
extern void ext4_clear_inode_es(struct inode *inode);
#endif /* _EXT4_EXTENTS_STATUS_H */
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index a9f3716119d3..448401e02c55 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -652,13 +652,6 @@ int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
ext4_update_inode_fsync_trans(handle, inode, 1);
count = ar.len;
- /*
- * Update reserved blocks/metadata blocks after successful block
- * allocation which had been deferred till now.
- */
- if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
- ext4_da_update_reserve_space(inode, count, 1);
-
got_it:
map->m_flags |= EXT4_MAP_MAPPED;
map->m_pblk = le32_to_cpu(chain[depth-1].key);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 82115d6656d3..546a3b09fd0a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -330,11 +330,14 @@ qsize_t *ext4_get_reserved_space(struct inode *inode)
* ext4_discard_preallocations() from here.
*/
void ext4_da_update_reserve_space(struct inode *inode,
- int used, int quota_claim)
+ int used, int quota_claim)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct ext4_inode_info *ei = EXT4_I(inode);
+ if (!used)
+ return;
+
spin_lock(&ei->i_block_reservation_lock);
trace_ext4_da_update_reserve_space(inode, used, quota_claim);
if (unlikely(used > ei->i_reserved_data_blocks)) {
--
2.39.2
next prev parent reply other threads:[~2023-08-24 9:31 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-24 9:26 [RFC PATCH 00/16] ext4: more accurate metadata reservaion for delalloc mount option Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 01/16] ext4: correct the start block of counting reserved clusters Zhang Yi
2023-08-30 13:10 ` Jan Kara
2023-10-06 2:33 ` Theodore Ts'o
2023-08-24 9:26 ` [RFC PATCH 02/16] ext4: make sure allocate pending entry not fail Zhang Yi
2023-08-30 13:25 ` Jan Kara
2023-10-06 2:33 ` Theodore Ts'o
2023-08-24 9:26 ` [RFC PATCH 03/16] ext4: let __revise_pending() return the number of new inserts pendings Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 04/16] ext4: count removed reserved blocks for delalloc only es entry Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 05/16] ext4: pass real delayed status into ext4_es_insert_extent() Zhang Yi
2023-08-24 9:26 ` Zhang Yi [this message]
2023-08-24 9:26 ` [RFC PATCH 07/16] ext4: count inode's total delalloc data blocks into ext4_es_tree Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 08/16] ext4: refactor delalloc space reservation Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 09/16] ext4: count reserved metadata blocks for delalloc per inode Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 10/16] ext4: reserve meta blocks in ext4_da_reserve_space() Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 11/16] ext4: factor out common part of ext4_da_{release|update_reserve}_space() Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 12/16] ext4: update reserved meta blocks in ext4_da_{release|update_reserve}_space() Zhang Yi
2023-09-06 7:35 ` kernel test robot
2023-08-24 9:26 ` [RFC PATCH 13/16] ext4: calculate the worst extent blocks needed of a delalloc es entry Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 14/16] ext4: reserve extent blocks for delalloc Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 15/16] ext4: flush delalloc blocks if no free space Zhang Yi
2023-08-24 9:26 ` [RFC PATCH 16/16] ext4: drop ext4_nonda_switch() Zhang Yi
2023-08-30 15:30 ` [RFC PATCH 00/16] ext4: more accurate metadata reservaion for delalloc mount option Jan Kara
2023-09-01 2:33 ` Zhang Yi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230824092619.1327976-7-yi.zhang@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=chengzhihao1@huawei.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).