[PATCH v6 00/11] cleanups and unit test for mballoc

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6 00/11] cleanups and unit test for mballoc
@ 2023-08-26 15:50 Kemeng Shi
  2023-08-26 15:50 ` [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
                   ` (11 more replies)
  0 siblings, 12 replies; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

v5-v6:
1. Separate block bitmap and buddy bitmap freeing in individual patch
and rewrite the descriptions.
2. Remove #ifdef around KUNIT_STATIC_STUB_REDIRECT which should be
only defined when CONFIG_KUNIT is enabled after fix [7] which was merged
into kunit-next/kunit
3. Use mbt prefix to distiguish test APIs from actual kernel APIs.
4. Add prepare function for ext4_mark_context and rename
ext4_mb_mark_group_bb to ext4_mb_mark_context
5. Support to run mballoc test with different layouts setting and
different block size is tested.

v4->v5:
1. WARN on free blocks to uninitialized group is removed as normal
fast commit route may triggers this, see [1] for details. The review
tag from Ojaswin of changed patch is also removed and a futher review
is needed.

v3->v4:
1. Collect Reviewed-by from Ojaswin
2. Do improve as Ojaswin kindly suggested: Fix typo in commit,
WARN if try to clear bit of uninitialized group and improve
refactoring of AGGRESSIVE_CHECK code.
3. Fix conflic on patch 16
4. Improve git log in patch 16,17

v2->v3:
1. Fix build warnings on "ext4: add some kunit stub for mballoc kunit
test" and "ext4: add first unit test for ext4_mb_new_blocks_simple in
mballoc"

This series is a new version of unmerged patches from [1]. The first 6
patches of this series factor out codes to mark blocks used or freed
which will update on disk block bitmap and group descriptor. Several
reasons to do this:
1. pair behavior of alloc/free bits. For example,
ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
2. remove repeat code to read from disk, update and write back to disk.
3. reduce future unit test mocks to avoid real IO to update structure
on disk.

The last 2 patches add a unit test for mballoc. Before more unit tests
are added, there are something should be discussed:
1. How to test static function in mballoc.c
Currently, include mballoc-test.c in mballoc.c to test static function
in mballoc.c from mballoc-test.c which is one way suggested in [2].
Not sure if there is any more elegant way to test static function without
touch mballoc.c.
2. How to add mocks to function in mballoc.c which may issue IO to disk
Currently, KUNIT_STATIC_STUB_REDIRECT is added to functions as suggested
in kunit document [3].
3. How to simulate a block bitmap.
Currently, a fake buffer_head with bitmap data is returned, then no
futher change is needed.
If we simulate a block bitmap with an array of data structure like:
struct test_bitmap {
       unsigned int	start;
       unsigned int	len;
}
which is suggested by Theodore in [4], then we need to add mocks to
function which expected bitmap from bitmap_bh->b_data, like
mb_find_next_bit, mb_find_next_zero_bit and maybe more.

Would like to hear any suggestion! Thanks!

I run kvm-xfstest with config "ext4/all" and "-g auto" together with
patchset for resize, you can see detail report in [6].

Unit test result is as followings:
# ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig --raw_output
[18:23:36] Configuring KUnit Kernel ...
[18:23:36] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=88
[18:23:40] Starting KUnit Kernel (1/1)...
KTAP version 1
1..2
    KTAP version 1
    # Subtest: ext4_mballoc_test
    1..1
        KTAP version 1
        # Subtest: test_new_blocks_simple
        ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
        ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
        ok 3 block_bits=18 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
    # test_new_blocks_simple: pass:3 fail:0 skip:0 total:3
    ok 1 test_new_blocks_simple
# Totals: pass:3 fail:0 skip:0 total:3
ok 1 ext4_mballoc_test
    KTAP version 1
    # Subtest: ext4_inode_test
    1..1
        KTAP version 1
        # Subtest: inode_test_xtimestamp_decoding
        ok 1 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits
        ok 2 1969-12-31 Upper bound of 32bit < 0 timestamp, no extra bits
        ok 3 1970-01-01 Lower bound of 32bit >=0 timestamp, no extra bits
        ok 4 2038-01-19 Upper bound of 32bit >=0 timestamp, no extra bits
        ok 5 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on
        ok 6 2106-02-07 Upper bound of 32bit <0 timestamp, lo extra sec bit on
        ok 7 2106-02-07 Lower bound of 32bit >=0 timestamp, lo extra sec bit on
        ok 8 2174-02-25 Upper bound of 32bit >=0 timestamp, lo extra sec bit on
        ok 9 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on
        ok 10 2242-03-16 Upper bound of 32bit <0 timestamp, hi extra sec bit on
        ok 11 2242-03-16 Lower bound of 32bit >=0 timestamp, hi extra sec bit on
        ok 12 2310-04-04 Upper bound of 32bit >=0 timestamp, hi extra sec bit on
        ok 13 2310-04-04 Upper bound of 32bit>=0 timestamp, hi extra sec bit 1. 1 ns
        ok 14 2378-04-22 Lower bound of 32bit>= timestamp. Extra sec bits 1. Max ns
        ok 15 2378-04-22 Lower bound of 32bit >=0 timestamp. All extra sec bits on
        ok 16 2446-05-10 Upper bound of 32bit >=0 timestamp. All extra sec bits on
    # inode_test_xtimestamp_decoding: pass:16 fail:0 skip:0 total:16
    ok 1 inode_test_xtimestamp_decoding
# Totals: pass:16 fail:0 skip:0 total:16
ok 2 ext4_inode_test
[18:23:41] Elapsed time: 4.960s total, 0.001s configuring, 4.842s building, 0.072s running

[1]
https://lore.kernel.org/linux-ext4/20230603150327.3596033-1-shikemeng@huaweicloud.com/T/#m5ff8e3a058ce1cb272dfef3262cd3202ce6e4358
[2]
https://lore.kernel.org/linux-ext4/ZC3MoWn2UO6p+Swp@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/
[3]
https://docs.kernel.org/dev-tools/kunit/usage.html#testing-static-functions
[4]
https://docs.kernel.org/dev-tools/kunit/api/functionredirection.html#c.KUNIT_STATIC_STUB_REDIRECT
[5]
https://lore.kernel.org/linux-ext4/20230317155047.GB3270589@mit.edu/
[6]
https://lore.kernel.org/linux-ext4/20230629120044.1261968-1-shikemeng@huaweicloud.com/T/#mcc8fb0697fd54d9267c02c027e1eb3468026ae56
[7]
https://lore.kernel.org/lkml/20230725172051.2142641-1-shikemeng@huaweicloud.com/

Kemeng Shi (11):
  ext4: factor out codes to update block bitmap and group descriptor on
    disk from ext4_mb_mark_bb
  ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
  ext4: extent ext4_mb_mark_context to support allocation under journal
  ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
  ext4: Separate block bitmap and buddy bitmap freeing in
    ext4_mb_clear_bb()
  ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
  ext4: Separate block bitmap and buddy bitmap freeing in
    ext4_group_add_blocks()
  ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
  ext4: add some kunit stub for mballoc kunit test
  ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
  ext4: run mballoc test with different layouts setting

 fs/ext4/balloc.c       |  10 +
 fs/ext4/mballoc-test.c | 351 ++++++++++++++++++++++++++++
 fs/ext4/mballoc.c      | 505 ++++++++++++++++-------------------------
 3 files changed, 557 insertions(+), 309 deletions(-)
 create mode 100644 fs/ext4/mballoc-test.c

-- 
2.30.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-08-31 12:33   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 02/11] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple Kemeng Shi
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

There are several reasons to add a general function to update block
bitmap and group descriptor on disk:
1. pair behavior of alloc/free bits. For example,
ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
2. remove repeat code to read from disk, update and write back to disk.
3. reduce future unit test mocks to catch real IO to update structure
on disk.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/ext4/mballoc.c | 169 +++++++++++++++++++++++++++-------------------
 1 file changed, 99 insertions(+), 70 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c91db9f57524..e2be572deb75 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3952,6 +3952,100 @@ void ext4_exit_mballoc(void)
 	ext4_groupinfo_destroy_slabs();
 }
 
+/*
+ * Collect global setting to reduce the number of variable passing to
+ * ext4_mb_mark_context. Pass target group blocks range directly to
+ * reuse the prepared global setting for multiple block ranges and
+ * to show clearly the specific block range will be marked.
+ */
+struct ext4_mark_context {
+	struct super_block *sb;
+	int state;
+};
+
+static inline void ext4_mb_prepare_mark_context(struct ext4_mark_context *mc,
+						struct super_block *sb,
+						int state)
+{
+	mc->sb = sb;
+	mc->state = state;
+}
+
+static int
+ext4_mb_mark_context(struct ext4_mark_context *mc, ext4_group_t group,
+		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
+{
+	struct super_block *sb = mc->sb;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct buffer_head *bitmap_bh = NULL;
+	struct ext4_group_desc *gdp;
+	struct buffer_head *gdp_bh;
+	int err;
+	unsigned int i, already, changed;
+
+	bitmap_bh = ext4_read_block_bitmap(sb, group);
+	if (IS_ERR(bitmap_bh))
+		return PTR_ERR(bitmap_bh);
+
+	err = -EIO;
+	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
+	if (!gdp)
+		goto out_err;
+
+	ext4_lock_group(sb, group);
+	if (ext4_has_group_desc_csum(sb) &&
+	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
+		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
+		ext4_free_group_clusters_set(sb, gdp,
+			ext4_free_clusters_after_init(sb, group, gdp));
+	}
+
+	already = 0;
+	for (i = 0; i < len; i++)
+		if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
+				mc->state)
+			already++;
+	changed = len - already;
+
+	if (mc->state) {
+		mb_set_bits(bitmap_bh->b_data, blkoff, len);
+		ext4_free_group_clusters_set(sb, gdp,
+			ext4_free_group_clusters(sb, gdp) - changed);
+	} else {
+		mb_clear_bits(bitmap_bh->b_data, blkoff, len);
+		ext4_free_group_clusters_set(sb, gdp,
+			ext4_free_group_clusters(sb, gdp) + changed);
+	}
+
+	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
+	ext4_group_desc_csum_set(sb, group, gdp);
+	ext4_unlock_group(sb, group);
+
+	if (sbi->s_log_groups_per_flex) {
+		ext4_group_t flex_group = ext4_flex_group(sbi, group);
+		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
+					   s_flex_groups, flex_group);
+
+		if (mc->state)
+			atomic64_sub(changed, &fg->free_clusters);
+		else
+			atomic64_add(changed, &fg->free_clusters);
+	}
+
+	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
+	if (err)
+		goto out_err;
+	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
+	if (err)
+		goto out_err;
+
+	sync_dirty_buffer(bitmap_bh);
+	sync_dirty_buffer(gdp_bh);
+
+out_err:
+	brelse(bitmap_bh);
+	return err;
+}
 
 /*
  * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps
@@ -4078,16 +4172,14 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 			int len, int state)
 {
-	struct buffer_head *bitmap_bh = NULL;
-	struct ext4_group_desc *gdp;
-	struct buffer_head *gdp_bh;
+	struct ext4_mark_context mc;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	ext4_group_t group;
 	ext4_grpblk_t blkoff;
-	int i, err = 0;
-	int already;
-	unsigned int clen, clen_changed, thisgrp_len;
+	int err = 0;
+	unsigned int clen, thisgrp_len;
 
+	ext4_mb_prepare_mark_context(&mc, sb, state);
 	while (len > 0) {
 		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
 
@@ -4107,80 +4199,17 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 			ext4_error(sb, "Marking blocks in system zone - "
 				   "Block = %llu, len = %u",
 				   block, thisgrp_len);
-			bitmap_bh = NULL;
-			break;
-		}
-
-		bitmap_bh = ext4_read_block_bitmap(sb, group);
-		if (IS_ERR(bitmap_bh)) {
-			err = PTR_ERR(bitmap_bh);
-			bitmap_bh = NULL;
 			break;
 		}
 
-		err = -EIO;
-		gdp = ext4_get_group_desc(sb, group, &gdp_bh);
-		if (!gdp)
-			break;
-
-		ext4_lock_group(sb, group);
-		already = 0;
-		for (i = 0; i < clen; i++)
-			if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
-					 !state)
-				already++;
-
-		clen_changed = clen - already;
-		if (state)
-			mb_set_bits(bitmap_bh->b_data, blkoff, clen);
-		else
-			mb_clear_bits(bitmap_bh->b_data, blkoff, clen);
-		if (ext4_has_group_desc_csum(sb) &&
-		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
-			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
-			ext4_free_group_clusters_set(sb, gdp,
-			     ext4_free_clusters_after_init(sb, group, gdp));
-		}
-		if (state)
-			clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
-		else
-			clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
-
-		ext4_free_group_clusters_set(sb, gdp, clen);
-		ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-		ext4_group_desc_csum_set(sb, group, gdp);
-
-		ext4_unlock_group(sb, group);
-
-		if (sbi->s_log_groups_per_flex) {
-			ext4_group_t flex_group = ext4_flex_group(sbi, group);
-			struct flex_groups *fg = sbi_array_rcu_deref(sbi,
-						   s_flex_groups, flex_group);
-
-			if (state)
-				atomic64_sub(clen_changed, &fg->free_clusters);
-			else
-				atomic64_add(clen_changed, &fg->free_clusters);
-
-		}
-
-		err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
-		if (err)
-			break;
-		sync_dirty_buffer(bitmap_bh);
-		err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
-		sync_dirty_buffer(gdp_bh);
+		err = ext4_mb_mark_context(&mc, group, blkoff, clen);
 		if (err)
 			break;
 
 		block += thisgrp_len;
 		len -= thisgrp_len;
-		brelse(bitmap_bh);
 		BUG_ON(len < 0);
 	}
-
-	if (err)
-		brelse(bitmap_bh);
 }
 
 /*
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 02/11] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
  2023-08-26 15:50 ` [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-08-31 14:25   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 03/11] ext4: extent ext4_mb_mark_context to support allocation under journal Kemeng Shi
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

call ext4_mb_mark_context in ext4_free_blocks_simple to:
1. remove repeat code
2. pair update of free_clusters in ext4_mb_new_blocks_simple.
3. add missing ext4_lock_group/ext4_unlock_group protection.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 35 +++--------------------------------
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e2be572deb75..c803f74aaf63 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6414,43 +6414,14 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
 static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
 					unsigned long count)
 {
-	struct buffer_head *bitmap_bh;
+	struct ext4_mark_context mc;
 	struct super_block *sb = inode->i_sb;
-	struct ext4_group_desc *gdp;
-	struct buffer_head *gdp_bh;
 	ext4_group_t group;
 	ext4_grpblk_t blkoff;
-	int already_freed = 0, err, i;
 
+	ext4_mb_prepare_mark_context(&mc, sb, 0);
 	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
-	bitmap_bh = ext4_read_block_bitmap(sb, group);
-	if (IS_ERR(bitmap_bh)) {
-		pr_warn("Failed to read block bitmap\n");
-		return;
-	}
-	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
-	if (!gdp)
-		goto err_out;
-
-	for (i = 0; i < count; i++) {
-		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data))
-			already_freed++;
-	}
-	mb_clear_bits(bitmap_bh->b_data, blkoff, count);
-	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
-	if (err)
-		goto err_out;
-	ext4_free_group_clusters_set(
-		sb, gdp, ext4_free_group_clusters(sb, gdp) +
-		count - already_freed);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, group, gdp);
-	ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
-	sync_dirty_buffer(bitmap_bh);
-	sync_dirty_buffer(gdp_bh);
-
-err_out:
-	brelse(bitmap_bh);
+	ext4_mb_mark_context(&mc, group, blkoff, count);
 }
 
 /**
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 03/11] ext4: extent ext4_mb_mark_context to support allocation under journal
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
  2023-08-26 15:50 ` [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
  2023-08-26 15:50 ` [PATCH v6 02/11] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-08-31 15:51   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 04/11] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used Kemeng Shi
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

Previously, ext4_mb_mark_context is only called under fast commit
replay path, so there is no valid handle when we update block bitmap
and group descriptor. This patch try to extent ext4_mb_mark_context
to be used by code under journal. There are several improves:
1. add "handle_t *handle" to struct ext4_mark_context to accept handle
to journal block bitmap and group descriptor update inside
ext4_mb_mark_context (the added journal caode is based on journal
code in ext4_mb_mark_diskspace_used where ext4_mb_mark_context
is going to be used.)
2. add EXT4_MB_BITMAP_MARKED_CHECK flag to control check if bits in block
bitmap are already marked as allocation code under journal asserts that
all bits to be changed are not marked before.
3. add "ext4_grpblk_t changed" to struct ext4_mark_context to notify number
of bits in block bitmap has changed.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/ext4/mballoc.c | 65 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 49 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c803f74aaf63..b066ee018cdb 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3952,6 +3952,8 @@ void ext4_exit_mballoc(void)
 	ext4_groupinfo_destroy_slabs();
 }
 
+#define EXT4_MB_BITMAP_MARKED_CHECK 0x0001
+#define EXT4_MB_SYNC_UPDATE 0x0002
 /*
  * Collect global setting to reduce the number of variable passing to
  * ext4_mb_mark_context. Pass target group blocks range directly to
@@ -3959,39 +3961,61 @@ void ext4_exit_mballoc(void)
  * to show clearly the specific block range will be marked.
  */
 struct ext4_mark_context {
+	handle_t *handle;
 	struct super_block *sb;
 	int state;
+	ext4_grpblk_t changed;
 };
 
 static inline void ext4_mb_prepare_mark_context(struct ext4_mark_context *mc,
+						handle_t *handle,
 						struct super_block *sb,
 						int state)
 {
+	mc->handle = handle;
 	mc->sb = sb;
 	mc->state = state;
 }
 
 static int
 ext4_mb_mark_context(struct ext4_mark_context *mc, ext4_group_t group,
-		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
+		     ext4_grpblk_t blkoff, ext4_grpblk_t len, int flags)
 {
+	handle_t *handle = mc->handle;
 	struct super_block *sb = mc->sb;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct buffer_head *bitmap_bh = NULL;
 	struct ext4_group_desc *gdp;
 	struct buffer_head *gdp_bh;
 	int err;
-	unsigned int i, already, changed;
+	unsigned int i, already, changed = len;
 
+	mc->changed = 0;
 	bitmap_bh = ext4_read_block_bitmap(sb, group);
 	if (IS_ERR(bitmap_bh))
 		return PTR_ERR(bitmap_bh);
 
+	if (handle) {
+		BUFFER_TRACE(bitmap_bh, "getting write access");
+		err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
+						    EXT4_JTR_NONE);
+		if (err)
+			goto out_err;
+	}
+
 	err = -EIO;
 	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
 	if (!gdp)
 		goto out_err;
 
+	if (handle) {
+		BUFFER_TRACE(gdp_bh, "get_write_access");
+		err = ext4_journal_get_write_access(handle, sb, gdp_bh,
+						    EXT4_JTR_NONE);
+		if (err)
+			goto out_err;
+	}
+
 	ext4_lock_group(sb, group);
 	if (ext4_has_group_desc_csum(sb) &&
 	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
@@ -4000,12 +4024,14 @@ ext4_mb_mark_context(struct ext4_mark_context *mc, ext4_group_t group,
 			ext4_free_clusters_after_init(sb, group, gdp));
 	}
 
-	already = 0;
-	for (i = 0; i < len; i++)
-		if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
-				mc->state)
-			already++;
-	changed = len - already;
+	if (flags & EXT4_MB_BITMAP_MARKED_CHECK) {
+		already = 0;
+		for (i = 0; i < len; i++)
+			if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
+					mc->state)
+				already++;
+		changed = len - already;
+	}
 
 	if (mc->state) {
 		mb_set_bits(bitmap_bh->b_data, blkoff, len);
@@ -4020,6 +4046,7 @@ ext4_mb_mark_context(struct ext4_mark_context *mc, ext4_group_t group,
 	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
 	ext4_group_desc_csum_set(sb, group, gdp);
 	ext4_unlock_group(sb, group);
+	mc->changed = changed;
 
 	if (sbi->s_log_groups_per_flex) {
 		ext4_group_t flex_group = ext4_flex_group(sbi, group);
@@ -4032,15 +4059,17 @@ ext4_mb_mark_context(struct ext4_mark_context *mc, ext4_group_t group,
 			atomic64_add(changed, &fg->free_clusters);
 	}
 
-	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
+	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
 	if (err)
 		goto out_err;
-	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
+	err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh);
 	if (err)
 		goto out_err;
 
-	sync_dirty_buffer(bitmap_bh);
-	sync_dirty_buffer(gdp_bh);
+	if (flags & EXT4_MB_SYNC_UPDATE) {
+		sync_dirty_buffer(bitmap_bh);
+		sync_dirty_buffer(gdp_bh);
+	}
 
 out_err:
 	brelse(bitmap_bh);
@@ -4179,7 +4208,7 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 	int err = 0;
 	unsigned int clen, thisgrp_len;
 
-	ext4_mb_prepare_mark_context(&mc, sb, state);
+	ext4_mb_prepare_mark_context(&mc, NULL, sb, state);
 	while (len > 0) {
 		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
 
@@ -4202,7 +4231,9 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 			break;
 		}
 
-		err = ext4_mb_mark_context(&mc, group, blkoff, clen);
+		err = ext4_mb_mark_context(&mc, group, blkoff, clen,
+					   EXT4_MB_BITMAP_MARKED_CHECK |
+					   EXT4_MB_SYNC_UPDATE);
 		if (err)
 			break;
 
@@ -6419,9 +6450,11 @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
 	ext4_group_t group;
 	ext4_grpblk_t blkoff;
 
-	ext4_mb_prepare_mark_context(&mc, sb, 0);
+	ext4_mb_prepare_mark_context(&mc, NULL, sb, 0);
 	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
-	ext4_mb_mark_context(&mc, group, blkoff, count);
+	ext4_mb_mark_context(&mc, group, blkoff, count,
+			     EXT4_MB_BITMAP_MARKED_CHECK |
+			     EXT4_MB_SYNC_UPDATE);
 }
 
 /**
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 04/11] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (2 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 03/11] ext4: extent ext4_mb_mark_context to support allocation under journal Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-09-01  3:51   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Kemeng Shi
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

call ext4_mb_mark_context in ext4_mb_mark_diskspace_used to:
1. remove repeat code to normally update bitmap and group descriptor
on disk.
2. call ext4_mb_mark_context instead of only setting bits in block bitmap
to fix the bitmap. Function ext4_mb_mark_context will also update
checksum of bitmap and other counter along with the bit change to keep
the cosistent with bit change or block bitmap will be marked corrupted as
checksum of bitmap is in inconsistent state.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/ext4/mballoc.c | 86 +++++++++++------------------------------------
 1 file changed, 20 insertions(+), 66 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index b066ee018cdb..e650eac22237 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4084,46 +4084,28 @@ static noinline_for_stack int
 ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 				handle_t *handle, unsigned int reserv_clstrs)
 {
-	struct buffer_head *bitmap_bh = NULL;
+	struct ext4_mark_context mc;
 	struct ext4_group_desc *gdp;
-	struct buffer_head *gdp_bh;
 	struct ext4_sb_info *sbi;
 	struct super_block *sb;
 	ext4_fsblk_t block;
 	int err, len;
+	int flags = 0;
 
 	BUG_ON(ac->ac_status != AC_STATUS_FOUND);
 	BUG_ON(ac->ac_b_ex.fe_len <= 0);
 
 	sb = ac->ac_sb;
 	sbi = EXT4_SB(sb);
+	ext4_mb_prepare_mark_context(&mc, handle, sb, 1);
 
-	bitmap_bh = ext4_read_block_bitmap(sb, ac->ac_b_ex.fe_group);
-	if (IS_ERR(bitmap_bh)) {
-		return PTR_ERR(bitmap_bh);
-	}
-
-	BUFFER_TRACE(bitmap_bh, "getting write access");
-	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
-					    EXT4_JTR_NONE);
-	if (err)
-		goto out_err;
-
-	err = -EIO;
-	gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, &gdp_bh);
+	gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, NULL);
 	if (!gdp)
-		goto out_err;
-
+		return -EIO;
 	ext4_debug("using block group %u(%d)\n", ac->ac_b_ex.fe_group,
 			ext4_free_group_clusters(sb, gdp));
 
-	BUFFER_TRACE(gdp_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, sb, gdp_bh, EXT4_JTR_NONE);
-	if (err)
-		goto out_err;
-
 	block = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
-
 	len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
 	if (!ext4_inode_block_valid(ac->ac_inode, block, len)) {
 		ext4_error(sb, "Allocating blocks %llu-%llu which overlap "
@@ -4132,41 +4114,28 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 		 * Fix the bitmap and return EFSCORRUPTED
 		 * We leak some of the blocks here.
 		 */
-		ext4_lock_group(sb, ac->ac_b_ex.fe_group);
-		mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
-			      ac->ac_b_ex.fe_len);
-		ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
-		err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
+		err = ext4_mb_mark_context(&mc, ac->ac_b_ex.fe_group,
+					   ac->ac_b_ex.fe_start,
+					   ac->ac_b_ex.fe_len,
+					   0);
 		if (!err)
 			err = -EFSCORRUPTED;
-		goto out_err;
+		return err;
 	}
 
-	ext4_lock_group(sb, ac->ac_b_ex.fe_group);
 #ifdef AGGRESSIVE_CHECK
-	{
-		int i;
-		for (i = 0; i < ac->ac_b_ex.fe_len; i++) {
-			BUG_ON(mb_test_bit(ac->ac_b_ex.fe_start + i,
-						bitmap_bh->b_data));
-		}
-	}
+	flags |= EXT4_MB_BITMAP_MARKED_CHECK;
 #endif
-	mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
-		      ac->ac_b_ex.fe_len);
-	if (ext4_has_group_desc_csum(sb) &&
-	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
-		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
-		ext4_free_group_clusters_set(sb, gdp,
-					     ext4_free_clusters_after_init(sb,
-						ac->ac_b_ex.fe_group, gdp));
-	}
-	len = ext4_free_group_clusters(sb, gdp) - ac->ac_b_ex.fe_len;
-	ext4_free_group_clusters_set(sb, gdp, len);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp);
+	err = ext4_mb_mark_context(&mc, ac->ac_b_ex.fe_group,
+				   ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len,
+				   flags);
+
+	if (err && mc.changed == 0)
+		return err;
 
-	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
+#ifdef AGGRESSIVE_CHECK
+	BUG_ON(mc.changed != ac->ac_b_ex.fe_len);
+#endif
 	percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
 	/*
 	 * Now reduce the dirty block count also. Should not go negative
@@ -4176,21 +4145,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 		percpu_counter_sub(&sbi->s_dirtyclusters_counter,
 				   reserv_clstrs);
 
-	if (sbi->s_log_groups_per_flex) {
-		ext4_group_t flex_group = ext4_flex_group(sbi,
-							  ac->ac_b_ex.fe_group);
-		atomic64_sub(ac->ac_b_ex.fe_len,
-			     &sbi_array_rcu_deref(sbi, s_flex_groups,
-						  flex_group)->free_clusters);
-	}
-
-	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
-	if (err)
-		goto out_err;
-	err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh);
-
-out_err:
-	brelse(bitmap_bh);
 	return err;
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (3 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 04/11] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-09-01  9:34   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 06/11] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb Kemeng Shi
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

This patch separates block bitmap and buddy bitmap freeing in order to
udpate block bitmap with ext4_mb_mark_context in following patch.

Separated freeing is safe with concurrent allocation as long as:
1. Firstly allocate block in buddy bitmap, and then in block bitmap.
2. Firstly free block in block bitmap, and then buddy bitmap.
Then freed block will only be available to allocation when both buddy
bitmap and block bitmap are updated by freeing.
Allocation obeys rule 1 already, just do sperated freeing with rule 2.

Separated freeing has no race with generate_buddy as:
Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
buddy page can be found in sbi->s_buddy_cache and no more buddy
initialization of the buddy page will be executed concurrently until
buddy page is unloaded. As we always do free in "load buddy, free,
unload buddy" sequence, separated freeing has no race with generate_buddy.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e650eac22237..01ad36a1cc96 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 	if (err)
 		goto error_return;
 
+	ext4_lock_group(sb, block_group);
+	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
+	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
+	ext4_free_group_clusters_set(sb, gdp, ret);
+	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
+	ext4_unlock_group(sb, block_group);
+
 	/*
 	 * We need to make sure we don't reuse the freed block until after the
 	 * transaction is committed. We make an exception if the inode is to be
@@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		new_entry->efd_tid = handle->h_transaction->t_tid;
 
 		ext4_lock_group(sb, block_group);
-		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
 		ext4_mb_free_metadata(handle, &e4b, new_entry);
 	} else {
-		/* need to update group_info->bb_free and bitmap
-		 * with group lock held. generate_buddy look at
-		 * them with group lock_held
-		 */
 		if (test_opt(sb, DISCARD)) {
 			err = ext4_issue_discard(sb, block_group, bit,
 						 count_clusters, NULL);
@@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
 
 		ext4_lock_group(sb, block_group);
-		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
 		mb_free_blocks(inode, &e4b, bit, count_clusters);
 	}
 
-	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
-	ext4_free_group_clusters_set(sb, gdp, ret);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 
 	if (sbi->s_log_groups_per_flex) {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 06/11] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (4 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-09-01  9:38   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 07/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() Kemeng Shi
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

Call ext4_mb_mark_context in ext4_mb_clear_bb to remove repeat code.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 83 +++++++++++------------------------------------
 1 file changed, 19 insertions(+), 64 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 01ad36a1cc96..25265531cb6a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6424,21 +6424,20 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 			       ext4_fsblk_t block, unsigned long count,
 			       int flags)
 {
-	struct buffer_head *bitmap_bh = NULL;
+	struct ext4_mark_context mc;
 	struct super_block *sb = inode->i_sb;
-	struct ext4_group_desc *gdp;
 	struct ext4_group_info *grp;
 	unsigned int overflow;
 	ext4_grpblk_t bit;
-	struct buffer_head *gd_bh;
 	ext4_group_t block_group;
 	struct ext4_sb_info *sbi;
 	struct ext4_buddy e4b;
 	unsigned int count_clusters;
 	int err = 0;
-	int ret;
+	int mark_flags = 0;
 
 	sbi = EXT4_SB(sb);
+	ext4_mb_prepare_mark_context(&mc, handle, sb, 0);
 
 	if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
 	    !ext4_inode_block_valid(inode, block, count)) {
@@ -6468,18 +6467,6 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		/* The range changed so it's no longer validated */
 		flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
 	}
-	count_clusters = EXT4_NUM_B2C(sbi, count);
-	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
-	if (IS_ERR(bitmap_bh)) {
-		err = PTR_ERR(bitmap_bh);
-		bitmap_bh = NULL;
-		goto error_return;
-	}
-	gdp = ext4_get_group_desc(sb, block_group, &gd_bh);
-	if (!gdp) {
-		err = -EIO;
-		goto error_return;
-	}
 
 	if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
 	    !ext4_inode_block_valid(inode, block, count)) {
@@ -6489,28 +6476,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		goto error_return;
 	}
 
-	BUFFER_TRACE(bitmap_bh, "getting write access");
-	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
-					    EXT4_JTR_NONE);
-	if (err)
-		goto error_return;
-
-	/*
-	 * We are about to modify some metadata.  Call the journal APIs
-	 * to unshare ->b_data if a currently-committing transaction is
-	 * using it
-	 */
-	BUFFER_TRACE(gd_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE);
-	if (err)
-		goto error_return;
-#ifdef AGGRESSIVE_CHECK
-	{
-		int i;
-		for (i = 0; i < count_clusters; i++)
-			BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data));
-	}
-#endif
+	count_clusters = EXT4_NUM_B2C(sbi, count);
 	trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters);
 
 	/* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */
@@ -6519,13 +6485,21 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 	if (err)
 		goto error_return;
 
-	ext4_lock_group(sb, block_group);
-	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
-	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
-	ext4_free_group_clusters_set(sb, gdp, ret);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, block_group, gdp);
-	ext4_unlock_group(sb, block_group);
+#ifdef AGGRESSIVE_CHECK
+	mark_flags |= EXT4_MB_BITMAP_MARKED_CHECK;
+#endif
+	err = ext4_mb_mark_context(&mc, block_group, bit, count_clusters,
+				   mark_flags);
+
+
+	if (err && mc.changed == 0) {
+		ext4_mb_unload_buddy(&e4b);
+		goto error_return;
+	}
+
+#ifdef AGGRESSIVE_CHECK
+	BUG_ON(mc.changed != count_clusters);
+#endif
 
 	/*
 	 * We need to make sure we don't reuse the freed block until after the
@@ -6568,13 +6542,6 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 
 	ext4_unlock_group(sb, block_group);
 
-	if (sbi->s_log_groups_per_flex) {
-		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
-		atomic64_add(count_clusters,
-			     &sbi_array_rcu_deref(sbi, s_flex_groups,
-						  flex_group)->free_clusters);
-	}
-
 	/*
 	 * on a bigalloc file system, defer the s_freeclusters_counter
 	 * update to the caller (ext4_remove_space and friends) so they
@@ -6589,26 +6556,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 
 	ext4_mb_unload_buddy(&e4b);
 
-	/* We dirtied the bitmap block */
-	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
-	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
-
-	/* And the group descriptor block */
-	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
-	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
-	if (!err)
-		err = ret;
-
 	if (overflow && !err) {
 		block += count;
 		count = overflow;
-		put_bh(bitmap_bh);
 		/* The range changed so it's no longer validated */
 		flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
 		goto do_more;
 	}
 error_return:
-	brelse(bitmap_bh);
 	ext4_std_error(sb, err);
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 07/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks()
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (5 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 06/11] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-08-26 15:50 ` [PATCH v6 08/11] ext4: call ext4_mb_mark_context " Kemeng Shi
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

This patch separates block bitmap and buddy bitmap freeing in order to
udpate block bitmap with ext4_mb_mark_context in following patch.
The reason why this can be sperated is explained in previous submit.
Put the explanation here to simplify the code archeology to
ext4_group_add_blocks():

Separated freeing is safe with concurrent allocation as long as:
1. Firstly allocate block in buddy bitmap, and then in block bitmap.
2. Firstly free block in block bitmap, and then buddy bitmap.
Then freed block will only be available to allocation when both buddy
bitmap and block bitmap are updated by freeing.
Allocation obeys rule 1 already, just do sperated freeing with rule 2.

Separated freeing has no race with generate_buddy as:
Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
buddy page can be found in sbi->s_buddy_cache and no more buddy
initialization of the buddy page will be executed concurrently until
buddy page is unloaded. As we always do free in "load buddy, free,
unload buddy" sequence, separated freeing has no race with generate_buddy.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 25265531cb6a..bb08a71a6e61 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6764,20 +6764,19 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 	if (err)
 		goto error_return;

-	/*
-	 * need to update group_info->bb_free and bitmap
-	 * with group lock held. generate_buddy look at
-	 * them with group lock_held
-	 */
 	ext4_lock_group(sb, block_group);
 	mb_clear_bits(bitmap_bh->b_data, bit, cluster_count);
-	mb_free_blocks(NULL, &e4b, bit, cluster_count);
 	free_clusters_count = clusters_freed +
 		ext4_free_group_clusters(sb, desc);
 	ext4_free_group_clusters_set(sb, desc, free_clusters_count);
 	ext4_block_bitmap_csum_set(sb, desc, bitmap_bh);
 	ext4_group_desc_csum_set(sb, block_group, desc);
 	ext4_unlock_group(sb, block_group);
+
+	ext4_lock_group(sb, block_group);
+	mb_free_blocks(NULL, &e4b, bit, cluster_count);
+	ext4_unlock_group(sb, block_group);
+
 	percpu_counter_add(&sbi->s_freeclusters_counter,
 			   clusters_freed);

-- 
2.30.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 08/11] ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (6 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 07/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-09-01  9:50   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 09/11] ext4: add some kunit stub for mballoc kunit test Kemeng Shi
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

Call ext4_mb_mark_context in ext4_group_add_blocks() to remove repeat code.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 84 +++++++----------------------------------------
 1 file changed, 12 insertions(+), 72 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index bb08a71a6e61..fdffa3b40bcd 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6681,25 +6681,22 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
 int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 			 ext4_fsblk_t block, unsigned long count)
 {
-	struct buffer_head *bitmap_bh = NULL;
-	struct buffer_head *gd_bh;
+	struct ext4_mark_context mc;
 	ext4_group_t block_group;
 	ext4_grpblk_t bit;
-	unsigned int i;
-	struct ext4_group_desc *desc;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_buddy e4b;
-	int err = 0, ret, free_clusters_count;
-	ext4_grpblk_t clusters_freed;
+	int err = 0;
 	ext4_fsblk_t first_cluster = EXT4_B2C(sbi, block);
 	ext4_fsblk_t last_cluster = EXT4_B2C(sbi, block + count - 1);
 	unsigned long cluster_count = last_cluster - first_cluster + 1;
 
 	ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
 
-	if (count == 0)
+	if (cluster_count == 0)
 		return 0;
 
+	ext4_mb_prepare_mark_context(&mc, handle, sb, 0);
 	ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
 	/*
 	 * Check to see if we are freeing blocks across a group
@@ -6712,19 +6709,6 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 		goto error_return;
 	}
 
-	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
-	if (IS_ERR(bitmap_bh)) {
-		err = PTR_ERR(bitmap_bh);
-		bitmap_bh = NULL;
-		goto error_return;
-	}
-
-	desc = ext4_get_group_desc(sb, block_group, &gd_bh);
-	if (!desc) {
-		err = -EIO;
-		goto error_return;
-	}
-
 	if (!ext4_sb_block_valid(sb, NULL, block, count)) {
 		ext4_error(sb, "Adding blocks in system zones - "
 			   "Block = %llu, count = %lu",
@@ -6733,74 +6717,30 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 		goto error_return;
 	}
 
-	BUFFER_TRACE(bitmap_bh, "getting write access");
-	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
-					    EXT4_JTR_NONE);
+	err = ext4_mb_load_buddy(sb, block_group, &e4b);
 	if (err)
 		goto error_return;
 
-	/*
-	 * We are about to modify some metadata.  Call the journal APIs
-	 * to unshare ->b_data if a currently-committing transaction is
-	 * using it
-	 */
-	BUFFER_TRACE(gd_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE);
-	if (err)
+	err = ext4_mb_mark_context(&mc, block_group, bit, cluster_count,
+				   EXT4_MB_BITMAP_MARKED_CHECK);
+	if (err && mc.changed == 0) {
+		ext4_mb_unload_buddy(&e4b);
 		goto error_return;
-
-	for (i = 0, clusters_freed = 0; i < cluster_count; i++) {
-		BUFFER_TRACE(bitmap_bh, "clear bit");
-		if (!mb_test_bit(bit + i, bitmap_bh->b_data)) {
-			ext4_error(sb, "bit already cleared for block %llu",
-				   (ext4_fsblk_t)(block + i));
-			BUFFER_TRACE(bitmap_bh, "bit already cleared");
-		} else {
-			clusters_freed++;
-		}
 	}
 
-	err = ext4_mb_load_buddy(sb, block_group, &e4b);
-	if (err)
-		goto error_return;
-
-	ext4_lock_group(sb, block_group);
-	mb_clear_bits(bitmap_bh->b_data, bit, cluster_count);
-	free_clusters_count = clusters_freed +
-		ext4_free_group_clusters(sb, desc);
-	ext4_free_group_clusters_set(sb, desc, free_clusters_count);
-	ext4_block_bitmap_csum_set(sb, desc, bitmap_bh);
-	ext4_group_desc_csum_set(sb, block_group, desc);
-	ext4_unlock_group(sb, block_group);
+	if (mc.changed != cluster_count)
+		ext4_error(sb, "bit already cleared in group %u", block_group);
 
 	ext4_lock_group(sb, block_group);
 	mb_free_blocks(NULL, &e4b, bit, cluster_count);
 	ext4_unlock_group(sb, block_group);
 
 	percpu_counter_add(&sbi->s_freeclusters_counter,
-			   clusters_freed);
-
-	if (sbi->s_log_groups_per_flex) {
-		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
-		atomic64_add(clusters_freed,
-			     &sbi_array_rcu_deref(sbi, s_flex_groups,
-						  flex_group)->free_clusters);
-	}
+			   mc.changed);
 
 	ext4_mb_unload_buddy(&e4b);
 
-	/* We dirtied the bitmap block */
-	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
-	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
-
-	/* And the group descriptor block */
-	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
-	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
-	if (!err)
-		err = ret;
-
 error_return:
-	brelse(bitmap_bh);
 	ext4_std_error(sb, err);
 	return err;
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 09/11] ext4: add some kunit stub for mballoc kunit test
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (7 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 08/11] ext4: call ext4_mb_mark_context " Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-09-01 14:18   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 10/11] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc Kemeng Shi
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

Multiblocks allocation will read and write block bitmap and group
descriptor which reside on disk. Add kunit stub to function
ext4_get_group_desc, ext4_read_block_bitmap_nowait, ext4_wait_block_bitmap
and ext4_mb_mark_context to avoid real IO to disk.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/balloc.c  | 10 ++++++++++
 fs/ext4/mballoc.c |  4 ++++
 2 files changed, 14 insertions(+)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 79b20d6ae39e..b0bf255b159f 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -22,6 +22,7 @@
 #include "mballoc.h"
 
 #include <trace/events/ext4.h>
+#include <kunit/static_stub.h>
 
 static unsigned ext4_num_base_meta_clusters(struct super_block *sb,
 					    ext4_group_t block_group);
@@ -274,6 +275,9 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb,
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct buffer_head *bh_p;
 
+	KUNIT_STATIC_STUB_REDIRECT(ext4_get_group_desc,
+				   sb, block_group, bh);
+
 	if (block_group >= ngroups) {
 		ext4_error(sb, "block_group >= groups_count - block_group = %u,"
 			   " groups_count = %u", block_group, ngroups);
@@ -468,6 +472,9 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group,
 	ext4_fsblk_t bitmap_blk;
 	int err;
 
+	KUNIT_STATIC_STUB_REDIRECT(ext4_read_block_bitmap_nowait,
+				   sb, block_group, ignore_locked);
+
 	desc = ext4_get_group_desc(sb, block_group, NULL);
 	if (!desc)
 		return ERR_PTR(-EFSCORRUPTED);
@@ -563,6 +570,9 @@ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group,
 {
 	struct ext4_group_desc *desc;
 
+	KUNIT_STATIC_STUB_REDIRECT(ext4_wait_block_bitmap,
+				   sb, block_group, bh);
+
 	if (!buffer_new(bh))
 		return 0;
 	desc = ext4_get_group_desc(sb, block_group, NULL);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index fdffa3b40bcd..5961b08ae7df 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -17,6 +17,7 @@
 #include <linux/nospec.h>
 #include <linux/backing-dev.h>
 #include <trace/events/ext4.h>
+#include <kunit/static_stub.h>
 
 /*
  * MUSTDO:
@@ -3990,6 +3991,9 @@ ext4_mb_mark_context(struct ext4_mark_context *mc, ext4_group_t group,
 	int err;
 	unsigned int i, already, changed = len;
 
+	KUNIT_STATIC_STUB_REDIRECT(ext4_mb_mark_context,
+				   mc, group, blkoff, len, flags);
+
 	mc->changed = 0;
 	bitmap_bh = ext4_read_block_bitmap(sb, group);
 	if (IS_ERR(bitmap_bh))
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 10/11] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (8 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 09/11] ext4: add some kunit stub for mballoc kunit test Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-09-01 14:29   ` Ritesh Harjani
  2023-08-26 15:50 ` [PATCH v6 11/11] ext4: run mballoc test with different layouts setting Kemeng Shi
  2023-08-29 19:02 ` [PATCH v6 00/11] cleanups and unit test for mballoc Ritesh Harjani
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

Here are prepared work:
1. Include mballoc-test.c to mballoc.c to be able test static function
in mballoc.c.
2. Implement static stub to avoid read IO to disk.
3. Construct fake super_block. Only partial members are set, more members
will be set when more functions are tested.
Then unit test for ext4_mb_new_blocks_simple is added.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc-test.c | 322 +++++++++++++++++++++++++++++++++++++++++
 fs/ext4/mballoc.c      |   4 +
 2 files changed, 326 insertions(+)
 create mode 100644 fs/ext4/mballoc-test.c

diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
new file mode 100644
index 000000000000..d643c56ac003
--- /dev/null
+++ b/fs/ext4/mballoc-test.c
@@ -0,0 +1,322 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test of ext4 multiblocks allocation.
+ */
+
+#include <kunit/test.h>
+#include <kunit/static_stub.h>
+
+#include "ext4.h"
+
+struct mbt_grp_ctx {
+	struct buffer_head bitmap_bh;
+	/* desc and gd_bh are just the place holders for now */
+	struct ext4_group_desc desc;
+	struct buffer_head gd_bh;
+};
+
+struct mbt_ctx {
+	struct mbt_grp_ctx *grp_ctx;
+};
+
+struct mbt_ext4_super_block {
+	struct super_block sb;
+	struct mbt_ctx mbt_ctx;
+};
+
+#define MBT_CTX(_sb) (&(container_of((_sb), struct mbt_ext4_super_block, sb)->mbt_ctx))
+#define MBT_GRP_CTX(_sb, _group) (&MBT_CTX(_sb)->grp_ctx[_group])
+
+static struct super_block *mbt_ext4_alloc_super_block(void)
+{
+	struct ext4_super_block *es = kzalloc(sizeof(*es), GFP_KERNEL);
+	struct ext4_sb_info *sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
+	struct mbt_ext4_super_block *fsb = kzalloc(sizeof(*fsb), GFP_KERNEL);
+
+	if (fsb == NULL || sbi == NULL || es == NULL)
+		goto out;
+
+	sbi->s_es = es;
+	fsb->sb.s_fs_info = sbi;
+	return &fsb->sb;
+
+out:
+	kfree(fsb);
+	kfree(sbi);
+	kfree(es);
+	return NULL;
+}
+
+static void mbt_ext4_free_super_block(struct super_block *sb)
+{
+	struct mbt_ext4_super_block *fsb = container_of(sb, struct mbt_ext4_super_block, sb);
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	kfree(sbi->s_es);
+	kfree(sbi);
+	kfree(fsb);
+}
+
+struct mbt_ext4_block_layout {
+	unsigned char blocksize_bits;
+	unsigned int cluster_bits;
+	uint32_t blocks_per_group;
+	ext4_group_t group_count;
+	uint16_t desc_size;
+};
+
+static void mbt_init_sb_layout(struct super_block *sb,
+			  struct mbt_ext4_block_layout *layout)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_super_block *es = sbi->s_es;
+
+	sb->s_blocksize = 1UL << layout->blocksize_bits;
+	sb->s_blocksize_bits = layout->blocksize_bits;
+
+	sbi->s_groups_count = layout->group_count;
+	sbi->s_blocks_per_group = layout->blocks_per_group;
+	sbi->s_cluster_bits = layout->cluster_bits;
+	sbi->s_cluster_ratio = 1U << layout->cluster_bits;
+	sbi->s_clusters_per_group = layout->blocks_per_group >>
+				    layout->cluster_bits;
+	sbi->s_desc_size = layout->desc_size;
+
+	es->s_first_data_block = cpu_to_le32(0);
+	es->s_blocks_count_lo = cpu_to_le32(layout->blocks_per_group *
+					    layout->group_count);
+}
+
+static int mbt_grp_ctx_init(struct super_block *sb,
+			   struct mbt_grp_ctx *grp_ctx)
+{
+	grp_ctx->bitmap_bh.b_data = kzalloc(EXT4_BLOCK_SIZE(sb), GFP_KERNEL);
+	if (grp_ctx->bitmap_bh.b_data == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void mbt_grp_ctx_release(struct mbt_grp_ctx *grp_ctx)
+{
+	kfree(grp_ctx->bitmap_bh.b_data);
+	grp_ctx->bitmap_bh.b_data = NULL;
+}
+
+static void mbt_ctx_mark_used(struct super_block *sb, ext4_group_t group,
+			     unsigned int start, unsigned int len)
+{
+	struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, group);
+
+	mb_set_bits(grp_ctx->bitmap_bh.b_data, start, len);
+}
+
+/* called after mbt_init_sb_layout */
+static int mbt_ctx_init(struct super_block *sb)
+{
+	struct mbt_ctx *ctx = MBT_CTX(sb);
+	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
+
+	ctx->grp_ctx = kcalloc(ngroups, sizeof(struct mbt_grp_ctx),
+			       GFP_KERNEL);
+	if (ctx->grp_ctx == NULL)
+		return -ENOMEM;
+
+	for (i = 0; i < ngroups; i++)
+		if (mbt_grp_ctx_init(sb, &ctx->grp_ctx[i]))
+			goto out;
+
+	/*
+	 * first data block(first cluster in first group) is used by
+	 * metadata, mark it used to avoid to alloc data block at first
+	 * block which will fail ext4_sb_block_valid check.
+	 */
+	mb_set_bits(ctx->grp_ctx[0].bitmap_bh.b_data, 0, 1);
+
+	return 0;
+out:
+	while (i-- > 0)
+		mbt_grp_ctx_release(&ctx->grp_ctx[i]);
+	kfree(ctx->grp_ctx);
+	return -ENOMEM;
+}
+
+static void mbt_ctx_release(struct super_block *sb)
+{
+	struct mbt_ctx *ctx = MBT_CTX(sb);
+	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
+
+	for (i = 0; i < ngroups; i++)
+		mbt_grp_ctx_release(&ctx->grp_ctx[i]);
+	kfree(ctx->grp_ctx);
+}
+
+static struct buffer_head *
+ext4_read_block_bitmap_nowait_stub(struct super_block *sb, ext4_group_t block_group,
+				   bool ignore_locked)
+{
+	struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, block_group);
+
+	/* paired with brelse from caller of ext4_read_block_bitmap_nowait */
+	get_bh(&grp_ctx->bitmap_bh);
+	return &grp_ctx->bitmap_bh;
+}
+
+static int ext4_wait_block_bitmap_stub(struct super_block *sb,
+				ext4_group_t block_group,
+				struct buffer_head *bh)
+{
+	return 0;
+}
+
+static struct ext4_group_desc *
+ext4_get_group_desc_stub(struct super_block *sb, ext4_group_t block_group,
+			 struct buffer_head **bh)
+{
+	struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, block_group);
+
+	if (bh != NULL)
+		*bh = &grp_ctx->gd_bh;
+
+	return &grp_ctx->desc;
+}
+
+static int ext4_mb_mark_context_stub(struct ext4_mark_context *mc,
+			       ext4_group_t group, ext4_grpblk_t blkoff,
+			       ext4_grpblk_t len, int flags)
+{
+	struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(mc->sb, group);
+	struct buffer_head *bitmap_bh = &grp_ctx->bitmap_bh;
+
+	if (mc->state)
+		mb_set_bits(bitmap_bh->b_data, blkoff, len);
+	else
+		mb_clear_bits(bitmap_bh->b_data, blkoff, len);
+
+	return 0;
+}
+
+#define TEST_BLOCKSIZE_BITS 10
+#define TEST_CLUSTER_BITS 3
+#define TEST_BLOCKS_PER_GROUP 8192
+#define TEST_GROUP_COUNT 4
+#define TEST_DESC_SIZE 64
+#define TEST_GOAL_GROUP 1
+static int mbt_kunit_init(struct kunit *test)
+{
+	struct mbt_ext4_block_layout layout = {
+		.blocksize_bits = TEST_BLOCKSIZE_BITS,
+		.cluster_bits = TEST_CLUSTER_BITS,
+		.blocks_per_group = TEST_BLOCKS_PER_GROUP,
+		.group_count = TEST_GROUP_COUNT,
+		.desc_size = TEST_DESC_SIZE,
+	};
+	struct super_block *sb;
+	int ret;
+
+	sb = mbt_ext4_alloc_super_block();
+	if (sb == NULL)
+		return -ENOMEM;
+
+	mbt_init_sb_layout(sb, &layout);
+
+	ret = mbt_ctx_init(sb);
+	if (ret != 0) {
+		mbt_ext4_free_super_block(sb);
+		return ret;
+	}
+
+	test->priv = sb;
+	kunit_activate_static_stub(test,
+				   ext4_read_block_bitmap_nowait,
+				   ext4_read_block_bitmap_nowait_stub);
+	kunit_activate_static_stub(test,
+				   ext4_wait_block_bitmap,
+				   ext4_wait_block_bitmap_stub);
+	kunit_activate_static_stub(test,
+				   ext4_get_group_desc,
+				   ext4_get_group_desc_stub);
+	kunit_activate_static_stub(test,
+				   ext4_mb_mark_context,
+				   ext4_mb_mark_context_stub);
+	return 0;
+}
+
+static void mbt_kunit_exit(struct kunit *test)
+{
+	struct super_block *sb = (struct super_block *)test->priv;
+
+	mbt_ctx_release(sb);
+	mbt_ext4_free_super_block(sb);
+}
+
+static void test_new_blocks_simple(struct kunit *test)
+{
+	struct super_block *sb = (struct super_block *)test->priv;
+	struct inode inode = { .i_sb = sb, };
+	struct ext4_allocation_request ar;
+	ext4_group_t i, goal_group = TEST_GOAL_GROUP;
+	int err = 0;
+	ext4_fsblk_t found;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	ar.inode = &inode;
+
+	/* get block at goal */
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_EQ_MSG(test, ar.goal, found,
+		"failed to alloc block at goal, expected %llu found %llu",
+		ar.goal, found);
+
+	/* get block after goal in goal group */
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_EQ_MSG(test, ar.goal + EXT4_C2B(sbi, 1), found,
+		"failed to alloc block after goal in goal group, expected %llu found %llu",
+		ar.goal + 1, found);
+
+	/* get block after goal group */
+	mbt_ctx_mark_used(sb, goal_group, 0, EXT4_CLUSTERS_PER_GROUP(sb));
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_EQ_MSG(test,
+		ext4_group_first_block_no(sb, goal_group + 1), found,
+		"failed to alloc block after goal group, expected %llu found %llu",
+		ext4_group_first_block_no(sb, goal_group + 1), found);
+
+	/* get block before goal group */
+	for (i = goal_group; i < ext4_get_groups_count(sb); i++)
+		mbt_ctx_mark_used(sb, i, 0, EXT4_CLUSTERS_PER_GROUP(sb));
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_EQ_MSG(test,
+		ext4_group_first_block_no(sb, 0) + EXT4_C2B(sbi, 1), found,
+		"failed to alloc block before goal group, expected %llu found %llu",
+		ext4_group_first_block_no(sb, 0 + EXT4_C2B(sbi, 1)), found);
+
+	/* no block available, fail to allocate block */
+	for (i = 0; i < ext4_get_groups_count(sb); i++)
+		mbt_ctx_mark_used(sb, i, 0, EXT4_CLUSTERS_PER_GROUP(sb));
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_NE_MSG(test, err, 0,
+		"unexpectedly get block when no block is available");
+}
+
+
+static struct kunit_case mbt_test_cases[] = {
+	KUNIT_CASE(test_new_blocks_simple),
+	{}
+};
+
+static struct kunit_suite mbt_test_suite = {
+	.name = "ext4_mballoc_test",
+	.init = mbt_kunit_init,
+	.exit = mbt_kunit_exit,
+	.test_cases = mbt_test_cases,
+};
+
+kunit_test_suites(&mbt_test_suite);
+
+MODULE_LICENSE("GPL");
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5961b08ae7df..7d230d6317e6 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -7041,3 +7041,7 @@ ext4_mballoc_query_range(
 
 	return error;
 }
+
+#ifdef CONFIG_EXT4_KUNIT_TESTS
+#include "mballoc-test.c"
+#endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 11/11] ext4: run mballoc test with different layouts setting
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (9 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 10/11] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc Kemeng Shi
@ 2023-08-26 15:50 ` Kemeng Shi
  2023-09-01 14:36   ` Ritesh Harjani
  2023-08-29 19:02 ` [PATCH v6 00/11] cleanups and unit test for mballoc Ritesh Harjani
  11 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-26 15:50 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list, linux-ext4, linux-kernel

Use KUNIT_CASE_PARAM to run mbalaloc test with different layouts setting.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc-test.c | 52 ++++++++++++++++++++++++++++++------------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
index d643c56ac003..af48a39c8ba2 100644
--- a/fs/ext4/mballoc-test.c
+++ b/fs/ext4/mballoc-test.c
@@ -196,21 +196,11 @@ static int ext4_mb_mark_context_stub(struct ext4_mark_context *mc,
 	return 0;
 }
 
-#define TEST_BLOCKSIZE_BITS 10
-#define TEST_CLUSTER_BITS 3
-#define TEST_BLOCKS_PER_GROUP 8192
-#define TEST_GROUP_COUNT 4
-#define TEST_DESC_SIZE 64
 #define TEST_GOAL_GROUP 1
 static int mbt_kunit_init(struct kunit *test)
 {
-	struct mbt_ext4_block_layout layout = {
-		.blocksize_bits = TEST_BLOCKSIZE_BITS,
-		.cluster_bits = TEST_CLUSTER_BITS,
-		.blocks_per_group = TEST_BLOCKS_PER_GROUP,
-		.group_count = TEST_GROUP_COUNT,
-		.desc_size = TEST_DESC_SIZE,
-	};
+	struct mbt_ext4_block_layout *layout =
+		(struct mbt_ext4_block_layout *)(test->param_value);
 	struct super_block *sb;
 	int ret;
 
@@ -218,7 +208,7 @@ static int mbt_kunit_init(struct kunit *test)
 	if (sb == NULL)
 		return -ENOMEM;
 
-	mbt_init_sb_layout(sb, &layout);
+	mbt_init_sb_layout(sb, layout);
 
 	ret = mbt_ctx_init(sb);
 	if (ret != 0) {
@@ -304,9 +294,43 @@ static void test_new_blocks_simple(struct kunit *test)
 		"unexpectedly get block when no block is available");
 }
 
+static const struct mbt_ext4_block_layout mbt_test_layouts[] = {
+	{
+		.blocksize_bits = 10,
+		.cluster_bits = 3,
+		.blocks_per_group = 8192,
+		.group_count = 4,
+		.desc_size = 64,
+	},
+	{
+		.blocksize_bits = 12,
+		.cluster_bits = 3,
+		.blocks_per_group = 8192,
+		.group_count = 4,
+		.desc_size = 64,
+	},
+	{
+		.blocksize_bits = 18,
+		.cluster_bits = 3,
+		.blocks_per_group = 8192,
+		.group_count = 4,
+		.desc_size = 64,
+	},
+};
+
+static void mbt_show_layout(const struct mbt_ext4_block_layout *layout,
+			    char *desc)
+{
+	snprintf(desc, KUNIT_PARAM_DESC_SIZE, "block_bits=%d cluster_bits=%d "
+			"blocks_per_group=%d group_count=%d desc_size=%d\n",
+			layout->blocksize_bits, layout->cluster_bits,
+			layout->blocks_per_group, layout->group_count,
+			layout->desc_size);
+}
+KUNIT_ARRAY_PARAM(mbt_layouts, mbt_test_layouts, mbt_show_layout);
 
 static struct kunit_case mbt_test_cases[] = {
-	KUNIT_CASE(test_new_blocks_simple),
+	KUNIT_CASE_PARAM(test_new_blocks_simple, mbt_layouts_gen_params),
 	{}
 };
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 00/11] cleanups and unit test for mballoc
  2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
                   ` (10 preceding siblings ...)
  2023-08-26 15:50 ` [PATCH v6 11/11] ext4: run mballoc test with different layouts setting Kemeng Shi
@ 2023-08-29 19:02 ` Ritesh Harjani
  2023-08-30  7:22   ` Kemeng Shi
  11 siblings, 1 reply; 36+ messages in thread
From: Ritesh Harjani @ 2023-08-29 19:02 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> v5-v6:
>

Hi Kemeng,

Sorry for the delay in getting started on this. I am going through the
series now.

> 1. Separate block bitmap and buddy bitmap freeing in individual patch
> and rewrite the descriptions.
> 2. Remove #ifdef around KUNIT_STATIC_STUB_REDIRECT which should be
> only defined when CONFIG_KUNIT is enabled after fix [7] which was merged
> into kunit-next/kunit
> 3. Use mbt prefix to distiguish test APIs from actual kernel APIs.
> 4. Add prepare function for ext4_mark_context and rename
> ext4_mb_mark_group_bb to ext4_mb_mark_context
> 5. Support to run mballoc test with different layouts setting and
> different block size is tested.
>
> v4->v5:
> 1. WARN on free blocks to uninitialized group is removed as normal
> fast commit route may triggers this, see [1] for details. The review
> tag from Ojaswin of changed patch is also removed and a futher review
> is needed.
>
> v3->v4:
> 1. Collect Reviewed-by from Ojaswin
> 2. Do improve as Ojaswin kindly suggested: Fix typo in commit,
> WARN if try to clear bit of uninitialized group and improve
> refactoring of AGGRESSIVE_CHECK code.
> 3. Fix conflic on patch 16
> 4. Improve git log in patch 16,17
>
> v2->v3:
> 1. Fix build warnings on "ext4: add some kunit stub for mballoc kunit
> test" and "ext4: add first unit test for ext4_mb_new_blocks_simple in
> mballoc"
>
> This series is a new version of unmerged patches from [1]. The first 6
> patches of this series factor out codes to mark blocks used or freed
> which will update on disk block bitmap and group descriptor. Several
> reasons to do this:
> 1. pair behavior of alloc/free bits. For example,
> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
> 2. remove repeat code to read from disk, update and write back to disk.
> 3. reduce future unit test mocks to avoid real IO to update structure
> on disk.
>
> The last 2 patches add a unit test for mballoc. Before more unit tests
> are added, there are something should be discussed:
> 1. How to test static function in mballoc.c
> Currently, include mballoc-test.c in mballoc.c to test static function
> in mballoc.c from mballoc-test.c which is one way suggested in [2].
> Not sure if there is any more elegant way to test static function without
> touch mballoc.c.
> 2. How to add mocks to function in mballoc.c which may issue IO to disk
> Currently, KUNIT_STATIC_STUB_REDIRECT is added to functions as suggested
> in kunit document [3].

I will get back to this, after reviwing some of the initial few
refactoring patches.

But FYI - I noticed some compilation errors, when building w/o
CONFIG_KUNIT.

../fs/ext4/balloc.c: In function ‘ext4_get_group_desc’:
../fs/ext4/balloc.c:278:2: error: implicit declaration of function ‘KUNIT_STATIC_STUB_REDIRECT’ [-Werror=implicit-function-declaration]
  278 |  KUNIT_STATIC_STUB_REDIRECT(ext4_get_group_desc,
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~

-ritesh

> 3. How to simulate a block bitmap.
> Currently, a fake buffer_head with bitmap data is returned, then no
> futher change is needed.
> If we simulate a block bitmap with an array of data structure like:
> struct test_bitmap {
>        unsigned int	start;
>        unsigned int	len;
> }
> which is suggested by Theodore in [4], then we need to add mocks to
> function which expected bitmap from bitmap_bh->b_data, like
> mb_find_next_bit, mb_find_next_zero_bit and maybe more.
>
> Would like to hear any suggestion! Thanks!
>
> I run kvm-xfstest with config "ext4/all" and "-g auto" together with
> patchset for resize, you can see detail report in [6].
>
> Unit test result is as followings:
> # ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig --raw_output
> [18:23:36] Configuring KUnit Kernel ...
> [18:23:36] Building KUnit Kernel ...
> Populating config with:
> $ make ARCH=um O=.kunit olddefconfig
> Building with:
> $ make ARCH=um O=.kunit --jobs=88
> [18:23:40] Starting KUnit Kernel (1/1)...
> KTAP version 1
> 1..2
>     KTAP version 1
>     # Subtest: ext4_mballoc_test
>     1..1
>         KTAP version 1
>         # Subtest: test_new_blocks_simple
>         ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
>         ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
>         ok 3 block_bits=18 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
>     # test_new_blocks_simple: pass:3 fail:0 skip:0 total:3
>     ok 1 test_new_blocks_simple
> # Totals: pass:3 fail:0 skip:0 total:3
> ok 1 ext4_mballoc_test
>     KTAP version 1
>     # Subtest: ext4_inode_test
>     1..1
>         KTAP version 1
>         # Subtest: inode_test_xtimestamp_decoding
>         ok 1 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits
>         ok 2 1969-12-31 Upper bound of 32bit < 0 timestamp, no extra bits
>         ok 3 1970-01-01 Lower bound of 32bit >=0 timestamp, no extra bits
>         ok 4 2038-01-19 Upper bound of 32bit >=0 timestamp, no extra bits
>         ok 5 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on
>         ok 6 2106-02-07 Upper bound of 32bit <0 timestamp, lo extra sec bit on
>         ok 7 2106-02-07 Lower bound of 32bit >=0 timestamp, lo extra sec bit on
>         ok 8 2174-02-25 Upper bound of 32bit >=0 timestamp, lo extra sec bit on
>         ok 9 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on
>         ok 10 2242-03-16 Upper bound of 32bit <0 timestamp, hi extra sec bit on
>         ok 11 2242-03-16 Lower bound of 32bit >=0 timestamp, hi extra sec bit on
>         ok 12 2310-04-04 Upper bound of 32bit >=0 timestamp, hi extra sec bit on
>         ok 13 2310-04-04 Upper bound of 32bit>=0 timestamp, hi extra sec bit 1. 1 ns
>         ok 14 2378-04-22 Lower bound of 32bit>= timestamp. Extra sec bits 1. Max ns
>         ok 15 2378-04-22 Lower bound of 32bit >=0 timestamp. All extra sec bits on
>         ok 16 2446-05-10 Upper bound of 32bit >=0 timestamp. All extra sec bits on
>     # inode_test_xtimestamp_decoding: pass:16 fail:0 skip:0 total:16
>     ok 1 inode_test_xtimestamp_decoding
> # Totals: pass:16 fail:0 skip:0 total:16
> ok 2 ext4_inode_test
> [18:23:41] Elapsed time: 4.960s total, 0.001s configuring, 4.842s building, 0.072s running
>
> [1]
> https://lore.kernel.org/linux-ext4/20230603150327.3596033-1-shikemeng@huaweicloud.com/T/#m5ff8e3a058ce1cb272dfef3262cd3202ce6e4358
> [2]
> https://lore.kernel.org/linux-ext4/ZC3MoWn2UO6p+Swp@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/
> [3]
> https://docs.kernel.org/dev-tools/kunit/usage.html#testing-static-functions
> [4]
> https://docs.kernel.org/dev-tools/kunit/api/functionredirection.html#c.KUNIT_STATIC_STUB_REDIRECT
> [5]
> https://lore.kernel.org/linux-ext4/20230317155047.GB3270589@mit.edu/
> [6]
> https://lore.kernel.org/linux-ext4/20230629120044.1261968-1-shikemeng@huaweicloud.com/T/#mcc8fb0697fd54d9267c02c027e1eb3468026ae56
> [7]
> https://lore.kernel.org/lkml/20230725172051.2142641-1-shikemeng@huaweicloud.com/
>
> Kemeng Shi (11):
>   ext4: factor out codes to update block bitmap and group descriptor on
>     disk from ext4_mb_mark_bb
>   ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
>   ext4: extent ext4_mb_mark_context to support allocation under journal
>   ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
>   ext4: Separate block bitmap and buddy bitmap freeing in
>     ext4_mb_clear_bb()
>   ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
>   ext4: Separate block bitmap and buddy bitmap freeing in
>     ext4_group_add_blocks()
>   ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
>   ext4: add some kunit stub for mballoc kunit test
>   ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
>   ext4: run mballoc test with different layouts setting
>
>  fs/ext4/balloc.c       |  10 +
>  fs/ext4/mballoc-test.c | 351 ++++++++++++++++++++++++++++
>  fs/ext4/mballoc.c      | 505 ++++++++++++++++-------------------------
>  3 files changed, 557 insertions(+), 309 deletions(-)
>  create mode 100644 fs/ext4/mballoc-test.c
>
> -- 
> 2.30.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 00/11] cleanups and unit test for mballoc
  2023-08-29 19:02 ` [PATCH v6 00/11] cleanups and unit test for mballoc Ritesh Harjani
@ 2023-08-30  7:22   ` Kemeng Shi
  2023-08-31 14:35     ` Ritesh Harjani
  0 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-30  7:22 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 8/30/2023 3:02 AM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> v5-v6:
>>
> 
> Hi Kemeng,
> 
> Sorry for the delay in getting started on this. I am going through the
> series now.
> 
>> 1. Separate block bitmap and buddy bitmap freeing in individual patch
>> and rewrite the descriptions.
>> 2. Remove #ifdef around KUNIT_STATIC_STUB_REDIRECT which should be
>> only defined when CONFIG_KUNIT is enabled after fix [7] which was merged
>> into kunit-next/kunit
Hi ritesh, thanks for feedback. I think the compilation problem below is
relevant to this change which relie on fix [7]. I'm not sure if I need to
include fix [7] in this set to fix the compilation error. Would like to
hear any advise!

>> 3. Use mbt prefix to distiguish test APIs from actual kernel APIs.
>> 4. Add prepare function for ext4_mark_context and rename
>> ext4_mb_mark_group_bb to ext4_mb_mark_context
>> 5. Support to run mballoc test with different layouts setting and
>> different block size is tested.
>>
>> v4->v5:
>> 1. WARN on free blocks to uninitialized group is removed as normal
>> fast commit route may triggers this, see [1] for details. The review
>> tag from Ojaswin of changed patch is also removed and a futher review
>> is needed.
>>
>> v3->v4:
>> 1. Collect Reviewed-by from Ojaswin
>> 2. Do improve as Ojaswin kindly suggested: Fix typo in commit,
>> WARN if try to clear bit of uninitialized group and improve
>> refactoring of AGGRESSIVE_CHECK code.
>> 3. Fix conflic on patch 16
>> 4. Improve git log in patch 16,17
>>
>> v2->v3:
>> 1. Fix build warnings on "ext4: add some kunit stub for mballoc kunit
>> test" and "ext4: add first unit test for ext4_mb_new_blocks_simple in
>> mballoc"
>>
>> This series is a new version of unmerged patches from [1]. The first 6
>> patches of this series factor out codes to mark blocks used or freed
>> which will update on disk block bitmap and group descriptor. Several
>> reasons to do this:
>> 1. pair behavior of alloc/free bits. For example,
>> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
>> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
>> 2. remove repeat code to read from disk, update and write back to disk.
>> 3. reduce future unit test mocks to avoid real IO to update structure
>> on disk.
>>
>> The last 2 patches add a unit test for mballoc. Before more unit tests
>> are added, there are something should be discussed:
>> 1. How to test static function in mballoc.c
>> Currently, include mballoc-test.c in mballoc.c to test static function
>> in mballoc.c from mballoc-test.c which is one way suggested in [2].
>> Not sure if there is any more elegant way to test static function without
>> touch mballoc.c.
>> 2. How to add mocks to function in mballoc.c which may issue IO to disk
>> Currently, KUNIT_STATIC_STUB_REDIRECT is added to functions as suggested
>> in kunit document [3].
> 
> I will get back to this, after reviwing some of the initial few
> refactoring patches.
> 
> But FYI - I noticed some compilation errors, when building w/o
> CONFIG_KUNIT.
> 
> ../fs/ext4/balloc.c: In function ‘ext4_get_group_desc’:
> ../fs/ext4/balloc.c:278:2: error: implicit declaration of function ‘KUNIT_STATIC_STUB_REDIRECT’ [-Werror=implicit-function-declaration]
>   278 |  KUNIT_STATIC_STUB_REDIRECT(ext4_get_group_desc,
>       |  ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> -ritesh
> 
>> 3. How to simulate a block bitmap.
>> Currently, a fake buffer_head with bitmap data is returned, then no
>> futher change is needed.
>> If we simulate a block bitmap with an array of data structure like:
>> struct test_bitmap {
>>        unsigned int	start;
>>        unsigned int	len;
>> }
>> which is suggested by Theodore in [4], then we need to add mocks to
>> function which expected bitmap from bitmap_bh->b_data, like
>> mb_find_next_bit, mb_find_next_zero_bit and maybe more.
>>
>> Would like to hear any suggestion! Thanks!
>>
>> I run kvm-xfstest with config "ext4/all" and "-g auto" together with
>> patchset for resize, you can see detail report in [6].
>>
>> Unit test result is as followings:
>> # ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig --raw_output
>> [18:23:36] Configuring KUnit Kernel ...
>> [18:23:36] Building KUnit Kernel ...
>> Populating config with:
>> $ make ARCH=um O=.kunit olddefconfig
>> Building with:
>> $ make ARCH=um O=.kunit --jobs=88
>> [18:23:40] Starting KUnit Kernel (1/1)...
>> KTAP version 1
>> 1..2
>>     KTAP version 1
>>     # Subtest: ext4_mballoc_test
>>     1..1
>>         KTAP version 1
>>         # Subtest: test_new_blocks_simple
>>         ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
>>         ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
>>         ok 3 block_bits=18 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
>>     # test_new_blocks_simple: pass:3 fail:0 skip:0 total:3
>>     ok 1 test_new_blocks_simple
>> # Totals: pass:3 fail:0 skip:0 total:3
>> ok 1 ext4_mballoc_test
>>     KTAP version 1
>>     # Subtest: ext4_inode_test
>>     1..1
>>         KTAP version 1
>>         # Subtest: inode_test_xtimestamp_decoding
>>         ok 1 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits
>>         ok 2 1969-12-31 Upper bound of 32bit < 0 timestamp, no extra bits
>>         ok 3 1970-01-01 Lower bound of 32bit >=0 timestamp, no extra bits
>>         ok 4 2038-01-19 Upper bound of 32bit >=0 timestamp, no extra bits
>>         ok 5 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on
>>         ok 6 2106-02-07 Upper bound of 32bit <0 timestamp, lo extra sec bit on
>>         ok 7 2106-02-07 Lower bound of 32bit >=0 timestamp, lo extra sec bit on
>>         ok 8 2174-02-25 Upper bound of 32bit >=0 timestamp, lo extra sec bit on
>>         ok 9 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on
>>         ok 10 2242-03-16 Upper bound of 32bit <0 timestamp, hi extra sec bit on
>>         ok 11 2242-03-16 Lower bound of 32bit >=0 timestamp, hi extra sec bit on
>>         ok 12 2310-04-04 Upper bound of 32bit >=0 timestamp, hi extra sec bit on
>>         ok 13 2310-04-04 Upper bound of 32bit>=0 timestamp, hi extra sec bit 1. 1 ns
>>         ok 14 2378-04-22 Lower bound of 32bit>= timestamp. Extra sec bits 1. Max ns
>>         ok 15 2378-04-22 Lower bound of 32bit >=0 timestamp. All extra sec bits on
>>         ok 16 2446-05-10 Upper bound of 32bit >=0 timestamp. All extra sec bits on
>>     # inode_test_xtimestamp_decoding: pass:16 fail:0 skip:0 total:16
>>     ok 1 inode_test_xtimestamp_decoding
>> # Totals: pass:16 fail:0 skip:0 total:16
>> ok 2 ext4_inode_test
>> [18:23:41] Elapsed time: 4.960s total, 0.001s configuring, 4.842s building, 0.072s running
>>
>> [1]
>> https://lore.kernel.org/linux-ext4/20230603150327.3596033-1-shikemeng@huaweicloud.com/T/#m5ff8e3a058ce1cb272dfef3262cd3202ce6e4358
>> [2]
>> https://lore.kernel.org/linux-ext4/ZC3MoWn2UO6p+Swp@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/
>> [3]
>> https://docs.kernel.org/dev-tools/kunit/usage.html#testing-static-functions
>> [4]
>> https://docs.kernel.org/dev-tools/kunit/api/functionredirection.html#c.KUNIT_STATIC_STUB_REDIRECT
>> [5]
>> https://lore.kernel.org/linux-ext4/20230317155047.GB3270589@mit.edu/
>> [6]
>> https://lore.kernel.org/linux-ext4/20230629120044.1261968-1-shikemeng@huaweicloud.com/T/#mcc8fb0697fd54d9267c02c027e1eb3468026ae56
>> [7]
>> https://lore.kernel.org/lkml/20230725172051.2142641-1-shikemeng@huaweicloud.com/
>>
>> Kemeng Shi (11):
>>   ext4: factor out codes to update block bitmap and group descriptor on
>>     disk from ext4_mb_mark_bb
>>   ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
>>   ext4: extent ext4_mb_mark_context to support allocation under journal
>>   ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
>>   ext4: Separate block bitmap and buddy bitmap freeing in
>>     ext4_mb_clear_bb()
>>   ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
>>   ext4: Separate block bitmap and buddy bitmap freeing in
>>     ext4_group_add_blocks()
>>   ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
>>   ext4: add some kunit stub for mballoc kunit test
>>   ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
>>   ext4: run mballoc test with different layouts setting
>>
>>  fs/ext4/balloc.c       |  10 +
>>  fs/ext4/mballoc-test.c | 351 ++++++++++++++++++++++++++++
>>  fs/ext4/mballoc.c      | 505 ++++++++++++++++-------------------------
>>  3 files changed, 557 insertions(+), 309 deletions(-)
>>  create mode 100644 fs/ext4/mballoc-test.c
>>
>> -- 
>> 2.30.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-08-26 15:50 ` [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
@ 2023-08-31 12:33   ` Ritesh Harjani
  2023-08-31 13:42     ` Kemeng Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Ritesh Harjani @ 2023-08-31 12:33 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

Hello Kemeng,

> There are several reasons to add a general function to update block
> bitmap and group descriptor on disk:

... named ext4_mb_mark_context(<params>)

> 1. pair behavior of alloc/free bits. For example,
> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
> 2. remove repeat code to read from disk, update and write back to disk.
> 3. reduce future unit test mocks to catch real IO to update structure
> on disk.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> ---
>  fs/ext4/mballoc.c | 169 +++++++++++++++++++++++++++-------------------
>  1 file changed, 99 insertions(+), 70 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index c91db9f57524..e2be572deb75 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3952,6 +3952,100 @@ void ext4_exit_mballoc(void)
>  	ext4_groupinfo_destroy_slabs();
>  }
>  
> +/*
> + * Collect global setting to reduce the number of variable passing to
> + * ext4_mb_mark_context. Pass target group blocks range directly to
> + * reuse the prepared global setting for multiple block ranges and
> + * to show clearly the specific block range will be marked.
> + */
> +struct ext4_mark_context {
> +	struct super_block *sb;
> +	int state;
> +};

This structure definition does not reflect of it's naming.
Why can't we also add cblk & clen, flags to it?

I think the idea of defining a new function named
ext4_mb_prepare_mark_context() was that we can prepare "struct ext4_mark_context"
with different cblk, clen & flags arguments for cases where we might
have to call ext4_mb_mark_context() more than once in the same function
or call ext4_mb_mark_context() anywhere but at the start of the function.

As I see it in the current series, we are calling
ext4_mb_prepare_mark_context() at the start of every function. Just for
this purpose we don't need an extra function, right? That we can directly do
at the time of declaring a structure variable itself (like how you did
in previous version)

What do you think of the approach where we add cblk, clen & flags
variables to ext4_mark_context()? Do you see any problems/difficulties
with that design?

> +
> +static inline void ext4_mb_prepare_mark_context(struct ext4_mark_context *mc,
> +						struct super_block *sb,
> +						int state)
> +{
> +	mc->sb = sb;
> +	mc->state = state;
> +}
> +
> +static int
> +ext4_mb_mark_context(struct ext4_mark_context *mc, ext4_group_t group,
> +		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
> +{
> +	struct super_block *sb = mc->sb;
> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> +	struct buffer_head *bitmap_bh = NULL;
> +	struct ext4_group_desc *gdp;
> +	struct buffer_head *gdp_bh;
> +	int err;
> +	unsigned int i, already, changed;
> +
> +	bitmap_bh = ext4_read_block_bitmap(sb, group);
> +	if (IS_ERR(bitmap_bh))
> +		return PTR_ERR(bitmap_bh);
> +
> +	err = -EIO;
> +	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> +	if (!gdp)
> +		goto out_err;
> +
> +	ext4_lock_group(sb, group);
> +	if (ext4_has_group_desc_csum(sb) &&
> +	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> +		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> +		ext4_free_group_clusters_set(sb, gdp,
> +			ext4_free_clusters_after_init(sb, group, gdp));
> +	}
> +
> +	already = 0;
> +	for (i = 0; i < len; i++)
> +		if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
> +				mc->state)
> +			already++;
> +	changed = len - already;
> +
> +	if (mc->state) {
> +		mb_set_bits(bitmap_bh->b_data, blkoff, len);
> +		ext4_free_group_clusters_set(sb, gdp,
> +			ext4_free_group_clusters(sb, gdp) - changed);
> +	} else {
> +		mb_clear_bits(bitmap_bh->b_data, blkoff, len);
> +		ext4_free_group_clusters_set(sb, gdp,
> +			ext4_free_group_clusters(sb, gdp) + changed);
> +	}
> +
> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> +	ext4_group_desc_csum_set(sb, group, gdp);
> +	ext4_unlock_group(sb, group);
> +
> +	if (sbi->s_log_groups_per_flex) {
> +		ext4_group_t flex_group = ext4_flex_group(sbi, group);
> +		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> +					   s_flex_groups, flex_group);
> +
> +		if (mc->state)
> +			atomic64_sub(changed, &fg->free_clusters);
> +		else
> +			atomic64_add(changed, &fg->free_clusters);
> +	}
> +
> +	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
> +	if (err)
> +		goto out_err;
> +	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
> +	if (err)
> +		goto out_err;
> +
> +	sync_dirty_buffer(bitmap_bh);
> +	sync_dirty_buffer(gdp_bh);
> +
> +out_err:
> +	brelse(bitmap_bh);
> +	return err;
> +}
>  
>  /*
>   * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps
> @@ -4078,16 +4172,14 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>  void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  			int len, int state)
>  {
> -	struct buffer_head *bitmap_bh = NULL;
> -	struct ext4_group_desc *gdp;
> -	struct buffer_head *gdp_bh;
> +	struct ext4_mark_context mc;
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>  	ext4_group_t group;
>  	ext4_grpblk_t blkoff;
> -	int i, err = 0;
> -	int already;
> -	unsigned int clen, clen_changed, thisgrp_len;
> +	int err = 0;
> +	unsigned int clen, thisgrp_len;
>  
> +	ext4_mb_prepare_mark_context(&mc, sb, state);
>  	while (len > 0) {
>  		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
>  
> @@ -4107,80 +4199,17 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  			ext4_error(sb, "Marking blocks in system zone - "
>  				   "Block = %llu, len = %u",
>  				   block, thisgrp_len);
> -			bitmap_bh = NULL;
> -			break;
> -		}
> -
> -		bitmap_bh = ext4_read_block_bitmap(sb, group);
> -		if (IS_ERR(bitmap_bh)) {
> -			err = PTR_ERR(bitmap_bh);
> -			bitmap_bh = NULL;
>  			break;
>  		}
>  
> -		err = -EIO;
> -		gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> -		if (!gdp)
> -			break;
> -
> -		ext4_lock_group(sb, group);
> -		already = 0;
> -		for (i = 0; i < clen; i++)
> -			if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
> -					 !state)
> -				already++;
> -
> -		clen_changed = clen - already;
> -		if (state)
> -			mb_set_bits(bitmap_bh->b_data, blkoff, clen);
> -		else
> -			mb_clear_bits(bitmap_bh->b_data, blkoff, clen);
> -		if (ext4_has_group_desc_csum(sb) &&
> -		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> -			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> -			ext4_free_group_clusters_set(sb, gdp,
> -			     ext4_free_clusters_after_init(sb, group, gdp));
> -		}
> -		if (state)
> -			clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
> -		else
> -			clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
> -
> -		ext4_free_group_clusters_set(sb, gdp, clen);
> -		ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> -		ext4_group_desc_csum_set(sb, group, gdp);
> -
> -		ext4_unlock_group(sb, group);
> -
> -		if (sbi->s_log_groups_per_flex) {
> -			ext4_group_t flex_group = ext4_flex_group(sbi, group);
> -			struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> -						   s_flex_groups, flex_group);
> -
> -			if (state)
> -				atomic64_sub(clen_changed, &fg->free_clusters);
> -			else
> -				atomic64_add(clen_changed, &fg->free_clusters);
> -
> -		}
> -
> -		err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
> -		if (err)
> -			break;
> -		sync_dirty_buffer(bitmap_bh);
> -		err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
> -		sync_dirty_buffer(gdp_bh);
> +		err = ext4_mb_mark_context(&mc, group, blkoff, clen);
>  		if (err)
>  			break;
>  
>  		block += thisgrp_len;
>  		len -= thisgrp_len;
> -		brelse(bitmap_bh);
>  		BUG_ON(len < 0);
>  	}
> -
> -	if (err)
> -		brelse(bitmap_bh);
>  }
>  
>  /*
> -- 
> 2.30.0


-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-08-31 12:33   ` Ritesh Harjani
@ 2023-08-31 13:42     ` Kemeng Shi
  2023-08-31 14:07       ` Ritesh Harjani
  0 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-08-31 13:42 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 8/31/2023 8:33 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
> Hello Kemeng,
> 
>> There are several reasons to add a general function to update block
>> bitmap and group descriptor on disk:
> 
> ... named ext4_mb_mark_context(<params>)
> 
>> 1. pair behavior of alloc/free bits. For example,
>> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
>> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
>> 2. remove repeat code to read from disk, update and write back to disk.
>> 3. reduce future unit test mocks to catch real IO to update structure
>> on disk.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>> ---
>>  fs/ext4/mballoc.c | 169 +++++++++++++++++++++++++++-------------------
>>  1 file changed, 99 insertions(+), 70 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index c91db9f57524..e2be572deb75 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -3952,6 +3952,100 @@ void ext4_exit_mballoc(void)
>>  	ext4_groupinfo_destroy_slabs();
>>  }
>>  
>> +/*
>> + * Collect global setting to reduce the number of variable passing to
>> + * ext4_mb_mark_context. Pass target group blocks range directly to
>> + * reuse the prepared global setting for multiple block ranges and
>> + * to show clearly the specific block range will be marked.
>> + */
>> +struct ext4_mark_context {
>> +	struct super_block *sb;
>> +	int state;
>> +};
> 
> This structure definition does not reflect of it's naming.
> Why can't we also add cblk & clen, flags to it?
> 
> I think the idea of defining a new function named
> ext4_mb_prepare_mark_context() was that we can prepare "struct ext4_mark_context"
> with different cblk, clen & flags arguments for cases where we might
> have to call ext4_mb_mark_context() more than once in the same function
> or call ext4_mb_mark_context() anywhere but at the start of the function.
> 
> As I see it in the current series, we are calling
> ext4_mb_prepare_mark_context() at the start of every function. Just for
> this purpose we don't need an extra function, right? That we can directly do
> at the time of declaring a structure variable itself (like how you did
> in previous version)
> 
Hi Ritesh, thanks for reply. The ext4_mark_context structure aims to reduce
variable passing to ext4_mb_mark_context. If we have to prepare a lot
member in ext4_mb_prepare_mark_context, then too many variables issue occurs
in ext4_mb_prepare_mark_context.
The name of ext4_mark_context maybe not proper. Actually I want a structure
to collect information which is not strongly relevant to mark blk bits. In
this way, we can initialize them at beginning of function, then there is no
need to pay attention to them or to pass them respectively in each call to
ext4_mb_mark_context. Instead, we foucus on the useful information called
with ext4_mb_mark_context.
This design also achive the goal to define ext4_mb_mark_context once for
multiple use in the same function as ext4_mark_context unlikely changes
after initialization at beginning.
> What do you think of the approach where we add cblk, clen & flags
> variables to ext4_mark_context()? Do you see any problems/difficulties
> with that design?
> 
The providing desgin looks good to me. Please let me konw if you still
perfre this and I will do this in next version. Thanks!

>> +
>> +static inline void ext4_mb_prepare_mark_context(struct ext4_mark_context *mc,
>> +						struct super_block *sb,
>> +						int state)
>> +{
>> +	mc->sb = sb;
>> +	mc->state = state;
>> +}
>> +
>> +static int
>> +ext4_mb_mark_context(struct ext4_mark_context *mc, ext4_group_t group,
>> +		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
>> +{
>> +	struct super_block *sb = mc->sb;
>> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
>> +	struct buffer_head *bitmap_bh = NULL;
>> +	struct ext4_group_desc *gdp;
>> +	struct buffer_head *gdp_bh;
>> +	int err;
>> +	unsigned int i, already, changed;
>> +
>> +	bitmap_bh = ext4_read_block_bitmap(sb, group);
>> +	if (IS_ERR(bitmap_bh))
>> +		return PTR_ERR(bitmap_bh);
>> +
>> +	err = -EIO;
>> +	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
>> +	if (!gdp)
>> +		goto out_err;
>> +
>> +	ext4_lock_group(sb, group);
>> +	if (ext4_has_group_desc_csum(sb) &&
>> +	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
>> +		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
>> +		ext4_free_group_clusters_set(sb, gdp,
>> +			ext4_free_clusters_after_init(sb, group, gdp));
>> +	}
>> +
>> +	already = 0;
>> +	for (i = 0; i < len; i++)
>> +		if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
>> +				mc->state)
>> +			already++;
>> +	changed = len - already;
>> +
>> +	if (mc->state) {
>> +		mb_set_bits(bitmap_bh->b_data, blkoff, len);
>> +		ext4_free_group_clusters_set(sb, gdp,
>> +			ext4_free_group_clusters(sb, gdp) - changed);
>> +	} else {
>> +		mb_clear_bits(bitmap_bh->b_data, blkoff, len);
>> +		ext4_free_group_clusters_set(sb, gdp,
>> +			ext4_free_group_clusters(sb, gdp) + changed);
>> +	}
>> +
>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> +	ext4_group_desc_csum_set(sb, group, gdp);
>> +	ext4_unlock_group(sb, group);
>> +
>> +	if (sbi->s_log_groups_per_flex) {
>> +		ext4_group_t flex_group = ext4_flex_group(sbi, group);
>> +		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
>> +					   s_flex_groups, flex_group);
>> +
>> +		if (mc->state)
>> +			atomic64_sub(changed, &fg->free_clusters);
>> +		else
>> +			atomic64_add(changed, &fg->free_clusters);
>> +	}
>> +
>> +	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
>> +	if (err)
>> +		goto out_err;
>> +	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
>> +	if (err)
>> +		goto out_err;
>> +
>> +	sync_dirty_buffer(bitmap_bh);
>> +	sync_dirty_buffer(gdp_bh);
>> +
>> +out_err:
>> +	brelse(bitmap_bh);
>> +	return err;
>> +}
>>  
>>  /*
>>   * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps
>> @@ -4078,16 +4172,14 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>>  void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>>  			int len, int state)
>>  {
>> -	struct buffer_head *bitmap_bh = NULL;
>> -	struct ext4_group_desc *gdp;
>> -	struct buffer_head *gdp_bh;
>> +	struct ext4_mark_context mc;
>>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>>  	ext4_group_t group;
>>  	ext4_grpblk_t blkoff;
>> -	int i, err = 0;
>> -	int already;
>> -	unsigned int clen, clen_changed, thisgrp_len;
>> +	int err = 0;
>> +	unsigned int clen, thisgrp_len;
>>  
>> +	ext4_mb_prepare_mark_context(&mc, sb, state);
>>  	while (len > 0) {
>>  		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
>>  
>> @@ -4107,80 +4199,17 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>>  			ext4_error(sb, "Marking blocks in system zone - "
>>  				   "Block = %llu, len = %u",
>>  				   block, thisgrp_len);
>> -			bitmap_bh = NULL;
>> -			break;
>> -		}
>> -
>> -		bitmap_bh = ext4_read_block_bitmap(sb, group);
>> -		if (IS_ERR(bitmap_bh)) {
>> -			err = PTR_ERR(bitmap_bh);
>> -			bitmap_bh = NULL;
>>  			break;
>>  		}
>>  
>> -		err = -EIO;
>> -		gdp = ext4_get_group_desc(sb, group, &gdp_bh);
>> -		if (!gdp)
>> -			break;
>> -
>> -		ext4_lock_group(sb, group);
>> -		already = 0;
>> -		for (i = 0; i < clen; i++)
>> -			if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
>> -					 !state)
>> -				already++;
>> -
>> -		clen_changed = clen - already;
>> -		if (state)
>> -			mb_set_bits(bitmap_bh->b_data, blkoff, clen);
>> -		else
>> -			mb_clear_bits(bitmap_bh->b_data, blkoff, clen);
>> -		if (ext4_has_group_desc_csum(sb) &&
>> -		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
>> -			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
>> -			ext4_free_group_clusters_set(sb, gdp,
>> -			     ext4_free_clusters_after_init(sb, group, gdp));
>> -		}
>> -		if (state)
>> -			clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
>> -		else
>> -			clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
>> -
>> -		ext4_free_group_clusters_set(sb, gdp, clen);
>> -		ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> -		ext4_group_desc_csum_set(sb, group, gdp);
>> -
>> -		ext4_unlock_group(sb, group);
>> -
>> -		if (sbi->s_log_groups_per_flex) {
>> -			ext4_group_t flex_group = ext4_flex_group(sbi, group);
>> -			struct flex_groups *fg = sbi_array_rcu_deref(sbi,
>> -						   s_flex_groups, flex_group);
>> -
>> -			if (state)
>> -				atomic64_sub(clen_changed, &fg->free_clusters);
>> -			else
>> -				atomic64_add(clen_changed, &fg->free_clusters);
>> -
>> -		}
>> -
>> -		err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
>> -		if (err)
>> -			break;
>> -		sync_dirty_buffer(bitmap_bh);
>> -		err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
>> -		sync_dirty_buffer(gdp_bh);
>> +		err = ext4_mb_mark_context(&mc, group, blkoff, clen);
>>  		if (err)
>>  			break;
>>  
>>  		block += thisgrp_len;
>>  		len -= thisgrp_len;
>> -		brelse(bitmap_bh);
>>  		BUG_ON(len < 0);
>>  	}
>> -
>> -	if (err)
>> -		brelse(bitmap_bh);
>>  }
>>  
>>  /*
>> -- 
>> 2.30.0
> 
> 
> -ritesh
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-08-31 13:42     ` Kemeng Shi
@ 2023-08-31 14:07       ` Ritesh Harjani
  2023-09-04  2:50         ` Kemeng Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Ritesh Harjani @ 2023-08-31 14:07 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> on 8/31/2023 8:33 PM, Ritesh Harjani wrote:
>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>> 
>> Hello Kemeng,
>> 
>>> There are several reasons to add a general function to update block
>>> bitmap and group descriptor on disk:
>> 
>> ... named ext4_mb_mark_context(<params>)
>> 
>>> 1. pair behavior of alloc/free bits. For example,
>>> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
>>> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
>>> 2. remove repeat code to read from disk, update and write back to disk.
>>> 3. reduce future unit test mocks to catch real IO to update structure
>>> on disk.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>>> ---
>>>  fs/ext4/mballoc.c | 169 +++++++++++++++++++++++++++-------------------
>>>  1 file changed, 99 insertions(+), 70 deletions(-)
>>>
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index c91db9f57524..e2be572deb75 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -3952,6 +3952,100 @@ void ext4_exit_mballoc(void)
>>>  	ext4_groupinfo_destroy_slabs();
>>>  }
>>>  
>>> +/*
>>> + * Collect global setting to reduce the number of variable passing to
>>> + * ext4_mb_mark_context. Pass target group blocks range directly to
>>> + * reuse the prepared global setting for multiple block ranges and
>>> + * to show clearly the specific block range will be marked.
>>> + */
>>> +struct ext4_mark_context {
>>> +	struct super_block *sb;
>>> +	int state;
>>> +};
>> 
>> This structure definition does not reflect of it's naming.
>> Why can't we also add cblk & clen, flags to it?
>> 
>> I think the idea of defining a new function named
>> ext4_mb_prepare_mark_context() was that we can prepare "struct ext4_mark_context"
>> with different cblk, clen & flags arguments for cases where we might
>> have to call ext4_mb_mark_context() more than once in the same function
>> or call ext4_mb_mark_context() anywhere but at the start of the function.
>> 
>> As I see it in the current series, we are calling
>> ext4_mb_prepare_mark_context() at the start of every function. Just for
>> this purpose we don't need an extra function, right? That we can directly do
>> at the time of declaring a structure variable itself (like how you did
>> in previous version)
>> 
> Hi Ritesh, thanks for reply. The ext4_mark_context structure aims to reduce
> variable passing to ext4_mb_mark_context. If we have to prepare a lot
> member in ext4_mb_prepare_mark_context, then too many variables issue occurs
> in ext4_mb_prepare_mark_context.
> The name of ext4_mark_context maybe not proper. Actually I want a structure
> to collect information which is not strongly relevant to mark blk bits. In
> this way, we can initialize them at beginning of function, then there is no
> need to pay attention to them or to pass them respectively in each call to
> ext4_mb_mark_context. Instead, we foucus on the useful information called
> with ext4_mb_mark_context.
> This design also achive the goal to define ext4_mb_mark_context once for
> multiple use in the same function as ext4_mark_context unlikely changes
> after initialization at beginning.
>> What do you think of the approach where we add cblk, clen & flags
>> variables to ext4_mark_context()? Do you see any problems/difficulties
>> with that design?
>> 
> The providing desgin looks good to me. Please let me konw if you still
> perfre this and I will do this in next version. Thanks!
>

I would have still preferred, the block and len arguments inside struct
ext4_mark_context, because that better explains the use and definition of
structure and it's prepare function.
However, since this is not any functionality change, I am fine if you
prefer the current design(as you mentioned above).
We can always discuss & change it later too :) 

Since otherwise the refactoring changes looks good to me.
Please feel free to add -

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

Thanks! 
-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 02/11] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
  2023-08-26 15:50 ` [PATCH v6 02/11] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple Kemeng Shi
@ 2023-08-31 14:25   ` Ritesh Harjani
  2023-09-04  2:51     ` Kemeng Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Ritesh Harjani @ 2023-08-31 14:25 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> call ext4_mb_mark_context in ext4_free_blocks_simple to:
> 1. remove repeat code
> 2. pair update of free_clusters in ext4_mb_new_blocks_simple.
> 3. add missing ext4_lock_group/ext4_unlock_group protection.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 35 +++--------------------------------
>  1 file changed, 3 insertions(+), 32 deletions(-)

Looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

(One small comment below for previous patch)

>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e2be572deb75..c803f74aaf63 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6414,43 +6414,14 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
>  static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
>  					unsigned long count)
>  {
> -	struct buffer_head *bitmap_bh;
> +	struct ext4_mark_context mc;
>  	struct super_block *sb = inode->i_sb;
> -	struct ext4_group_desc *gdp;
> -	struct buffer_head *gdp_bh;
>  	ext4_group_t group;
>  	ext4_grpblk_t blkoff;
> -	int already_freed = 0, err, i;
>  
> +	ext4_mb_prepare_mark_context(&mc, sb, 0);

It looks like we always use 0 or 1 as the state for struct
ext4_mark_context. In that case we can keep state member of this struct
as bool instead of int. 


>  	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
> -	bitmap_bh = ext4_read_block_bitmap(sb, group);
> -	if (IS_ERR(bitmap_bh)) {
> -		pr_warn("Failed to read block bitmap\n");
> -		return;
> -	}
> -	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> -	if (!gdp)
> -		goto err_out;
> -
> -	for (i = 0; i < count; i++) {
> -		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data))
> -			already_freed++;
> -	}
> -	mb_clear_bits(bitmap_bh->b_data, blkoff, count);
> -	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
> -	if (err)
> -		goto err_out;
> -	ext4_free_group_clusters_set(
> -		sb, gdp, ext4_free_group_clusters(sb, gdp) +
> -		count - already_freed);
> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> -	ext4_group_desc_csum_set(sb, group, gdp);
> -	ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
> -	sync_dirty_buffer(bitmap_bh);
> -	sync_dirty_buffer(gdp_bh);
> -
> -err_out:
> -	brelse(bitmap_bh);
> +	ext4_mb_mark_context(&mc, group, blkoff, count);
>  }
>  
>  /**
> -- 
> 2.30.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 00/11] cleanups and unit test for mballoc
  2023-08-30  7:22   ` Kemeng Shi
@ 2023-08-31 14:35     ` Ritesh Harjani
  0 siblings, 0 replies; 36+ messages in thread
From: Ritesh Harjani @ 2023-08-31 14:35 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> on 8/30/2023 3:02 AM, Ritesh Harjani wrote:
>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>> 
>>> v5-v6:
>>>
>> 
>> Hi Kemeng,
>> 
>> Sorry for the delay in getting started on this. I am going through the
>> series now.
>> 
>>> 1. Separate block bitmap and buddy bitmap freeing in individual patch
>>> and rewrite the descriptions.
>>> 2. Remove #ifdef around KUNIT_STATIC_STUB_REDIRECT which should be
>>> only defined when CONFIG_KUNIT is enabled after fix [7] which was merged
>>> into kunit-next/kunit
> Hi ritesh, thanks for feedback. I think the compilation problem below is
> relevant to this change which relie on fix [7]. I'm not sure if I need to
> include fix [7] in this set to fix the compilation error. Would like to
> hear any advise!
>

No, we need not include [7] in this series. I should have noticed that
in your above updates. I generally also provide a separate note or at
the top section about which branch and/or changes are requried for
testing the patch series, hence I overlooked it.

Thanks for pointing out.

-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 03/11] ext4: extent ext4_mb_mark_context to support allocation under journal
  2023-08-26 15:50 ` [PATCH v6 03/11] ext4: extent ext4_mb_mark_context to support allocation under journal Kemeng Shi
@ 2023-08-31 15:51   ` Ritesh Harjani
  0 siblings, 0 replies; 36+ messages in thread
From: Ritesh Harjani @ 2023-08-31 15:51 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

commit: extent ext4_mb_mark_context to support allocation under journal
        ^^^ extend

> Previously, ext4_mb_mark_context is only called under fast commit
> replay path, so there is no valid handle when we update block bitmap
> and group descriptor. This patch try to extent ext4_mb_mark_context

                                          ^^^^ extend

> to be used by code under journal. There are several improves:
                                                       ^^^ improvement:            
> 1. add "handle_t *handle" to struct ext4_mark_context to accept handle
> to journal block bitmap and group descriptor update inside
> ext4_mb_mark_context (the added journal caode is based on journal
                                          ^^^ code
                            (we can remove the next "journal code in")

> code in ext4_mb_mark_diskspace_used where ext4_mb_mark_context
> is going to be used.)
> 2. add EXT4_MB_BITMAP_MARKED_CHECK flag to control check if bits in block
> bitmap are already marked as allocation code under journal asserts that
> all bits to be changed are not marked before.
> 3. add "ext4_grpblk_t changed" to struct ext4_mark_context to notify number
> of bits in block bitmap has changed.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

The patch functionally looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>


-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 04/11] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
  2023-08-26 15:50 ` [PATCH v6 04/11] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used Kemeng Shi
@ 2023-09-01  3:51   ` Ritesh Harjani
  2023-09-04  2:54     ` Kemeng Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-01  3:51 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> call ext4_mb_mark_context in ext4_mb_mark_diskspace_used to:
> 1. remove repeat code to normally update bitmap and group descriptor
> on disk.
> 2. call ext4_mb_mark_context instead of only setting bits in block bitmap
> to fix the bitmap. Function ext4_mb_mark_context will also update
> checksum of bitmap and other counter along with the bit change to keep
> the cosistent with bit change or block bitmap will be marked corrupted as
> checksum of bitmap is in inconsistent state.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> ---
>  fs/ext4/mballoc.c | 86 +++++++++++------------------------------------
>  1 file changed, 20 insertions(+), 66 deletions(-)

I was wondering whether checking for !ext4_inode_block_valid() can also
be part of ext4_mb_mark_context() by passing EXT4_MB_METABLOCKS_VALID_CHECK
flag.

But as for this patch. It looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

-ritesh

>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index b066ee018cdb..e650eac22237 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4084,46 +4084,28 @@ static noinline_for_stack int
>  ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>  				handle_t *handle, unsigned int reserv_clstrs)
>  {
> -	struct buffer_head *bitmap_bh = NULL;
> +	struct ext4_mark_context mc;
>  	struct ext4_group_desc *gdp;
> -	struct buffer_head *gdp_bh;
>  	struct ext4_sb_info *sbi;
>  	struct super_block *sb;
>  	ext4_fsblk_t block;
>  	int err, len;
> +	int flags = 0;
>  
>  	BUG_ON(ac->ac_status != AC_STATUS_FOUND);
>  	BUG_ON(ac->ac_b_ex.fe_len <= 0);
>  
>  	sb = ac->ac_sb;
>  	sbi = EXT4_SB(sb);
> +	ext4_mb_prepare_mark_context(&mc, handle, sb, 1);
>  
> -	bitmap_bh = ext4_read_block_bitmap(sb, ac->ac_b_ex.fe_group);
> -	if (IS_ERR(bitmap_bh)) {
> -		return PTR_ERR(bitmap_bh);
> -	}
> -
> -	BUFFER_TRACE(bitmap_bh, "getting write access");
> -	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
> -					    EXT4_JTR_NONE);
> -	if (err)
> -		goto out_err;
> -
> -	err = -EIO;
> -	gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, &gdp_bh);
> +	gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, NULL);
>  	if (!gdp)
> -		goto out_err;
> -
> +		return -EIO;
>  	ext4_debug("using block group %u(%d)\n", ac->ac_b_ex.fe_group,
>  			ext4_free_group_clusters(sb, gdp));
>  
> -	BUFFER_TRACE(gdp_bh, "get_write_access");
> -	err = ext4_journal_get_write_access(handle, sb, gdp_bh, EXT4_JTR_NONE);
> -	if (err)
> -		goto out_err;
> -
>  	block = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
> -
>  	len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
>  	if (!ext4_inode_block_valid(ac->ac_inode, block, len)) {
>  		ext4_error(sb, "Allocating blocks %llu-%llu which overlap "
> @@ -4132,41 +4114,28 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>  		 * Fix the bitmap and return EFSCORRUPTED
>  		 * We leak some of the blocks here.
>  		 */
> -		ext4_lock_group(sb, ac->ac_b_ex.fe_group);
> -		mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
> -			      ac->ac_b_ex.fe_len);
> -		ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
> -		err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> +		err = ext4_mb_mark_context(&mc, ac->ac_b_ex.fe_group,
> +					   ac->ac_b_ex.fe_start,
> +					   ac->ac_b_ex.fe_len,
> +					   0);
>  		if (!err)
>  			err = -EFSCORRUPTED;
> -		goto out_err;
> +		return err;
>  	}
>  
> -	ext4_lock_group(sb, ac->ac_b_ex.fe_group);
>  #ifdef AGGRESSIVE_CHECK
> -	{
> -		int i;
> -		for (i = 0; i < ac->ac_b_ex.fe_len; i++) {
> -			BUG_ON(mb_test_bit(ac->ac_b_ex.fe_start + i,
> -						bitmap_bh->b_data));
> -		}
> -	}
> +	flags |= EXT4_MB_BITMAP_MARKED_CHECK;
>  #endif
> -	mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
> -		      ac->ac_b_ex.fe_len);
> -	if (ext4_has_group_desc_csum(sb) &&
> -	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> -		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> -		ext4_free_group_clusters_set(sb, gdp,
> -					     ext4_free_clusters_after_init(sb,
> -						ac->ac_b_ex.fe_group, gdp));
> -	}
> -	len = ext4_free_group_clusters(sb, gdp) - ac->ac_b_ex.fe_len;
> -	ext4_free_group_clusters_set(sb, gdp, len);
> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> -	ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp);
> +	err = ext4_mb_mark_context(&mc, ac->ac_b_ex.fe_group,
> +				   ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len,
> +				   flags);
> +
> +	if (err && mc.changed == 0)
> +		return err;
>  
> -	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
> +#ifdef AGGRESSIVE_CHECK
> +	BUG_ON(mc.changed != ac->ac_b_ex.fe_len);
> +#endif
>  	percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
>  	/*
>  	 * Now reduce the dirty block count also. Should not go negative
> @@ -4176,21 +4145,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>  		percpu_counter_sub(&sbi->s_dirtyclusters_counter,
>  				   reserv_clstrs);
>  
> -	if (sbi->s_log_groups_per_flex) {
> -		ext4_group_t flex_group = ext4_flex_group(sbi,
> -							  ac->ac_b_ex.fe_group);
> -		atomic64_sub(ac->ac_b_ex.fe_len,
> -			     &sbi_array_rcu_deref(sbi, s_flex_groups,
> -						  flex_group)->free_clusters);
> -	}
> -
> -	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> -	if (err)
> -		goto out_err;
> -	err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh);
> -
> -out_err:
> -	brelse(bitmap_bh);
>  	return err;
>  }
>  
> -- 
> 2.30.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  2023-08-26 15:50 ` [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Kemeng Shi
@ 2023-09-01  9:34   ` Ritesh Harjani
  2023-09-04  3:00     ` Kemeng Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-01  9:34 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> This patch separates block bitmap and buddy bitmap freeing in order to
> udpate block bitmap with ext4_mb_mark_context in following patch.
^^^ update.

>
> Separated freeing is safe with concurrent allocation as long as:
> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
> 2. Firstly free block in block bitmap, and then buddy bitmap.
> Then freed block will only be available to allocation when both buddy
> bitmap and block bitmap are updated by freeing.
> Allocation obeys rule 1 already, just do sperated freeing with rule 2.

So we also don't need ext4_load_buddy_gfp() before freeing on-disk
bitmap right. Continue below...

>
> Separated freeing has no race with generate_buddy as:
> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
> buddy page can be found in sbi->s_buddy_cache and no more buddy
> initialization of the buddy page will be executed concurrently until
> buddy page is unloaded. As we always do free in "load buddy, free,
> unload buddy" sequence, separated freeing has no race with generate_buddy.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e650eac22237..01ad36a1cc96 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  	if (err)
>  		goto error_return;


Let me add the a piece of code before the added changes and continue...

	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
				     GFP_NOFS|__GFP_NOFAIL);
	if (err)
		goto error_return;
>  
> +	ext4_lock_group(sb, block_group);
> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
> +	ext4_free_group_clusters_set(sb, gdp, ret);
> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> +	ext4_group_desc_csum_set(sb, block_group, gdp);
> +	ext4_unlock_group(sb, block_group);
> +

...Is it required for ext4_mb_load_buddy_gfp() to be called before
freeing on-disk bitmap blocks? Can you explain why please?
At least it is not very clear to me that why do we need
ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
needed then I think we should separate out bitmap freeing logic and
buddy freeing logic even further.

Thoughts?

>  	/*
>  	 * We need to make sure we don't reuse the freed block until after the
>  	 * transaction is committed. We make an exception if the inode is to be
> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>  
>  		ext4_lock_group(sb, block_group);
> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>  	} else {
> -		/* need to update group_info->bb_free and bitmap
> -		 * with group lock held. generate_buddy look at
> -		 * them with group lock_held
> -		 */
>  		if (test_opt(sb, DISCARD)) {
>  			err = ext4_issue_discard(sb, block_group, bit,
>  						 count_clusters, NULL);
> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>  
>  		ext4_lock_group(sb, block_group);
> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>  	}
>  
> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
> -	ext4_free_group_clusters_set(sb, gdp, ret);
> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>  	ext4_unlock_group(sb, block_group);
>  
>  	if (sbi->s_log_groups_per_flex) {


Adding piece of code here...

	if (sbi->s_log_groups_per_flex) {
		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
		atomic64_add(count_clusters,
			     &sbi_array_rcu_deref(sbi, s_flex_groups,
						  flex_group)->free_clusters);
	}

<...>

	/* We dirtied the bitmap block */
	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
	/* And the group descriptor block */
	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
	if (!err)
		err = ret;

I was thinking even this can go around bitmap freeing logic above. So
the next patch becomes very clear that all of the bitmap freeing logic
is just simply moved into ext4_mb_mark_context() function.

-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 06/11] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
  2023-08-26 15:50 ` [PATCH v6 06/11] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb Kemeng Shi
@ 2023-09-01  9:38   ` Ritesh Harjani
  0 siblings, 0 replies; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-01  9:38 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Call ext4_mb_mark_context in ext4_mb_clear_bb to remove repeat code.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 83 +++++++++++------------------------------------
>  1 file changed, 19 insertions(+), 64 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 01ad36a1cc96..25265531cb6a 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6424,21 +6424,20 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  			       ext4_fsblk_t block, unsigned long count,
>  			       int flags)
>  {
> -	struct buffer_head *bitmap_bh = NULL;
> +	struct ext4_mark_context mc;
>  	struct super_block *sb = inode->i_sb;
> -	struct ext4_group_desc *gdp;
>  	struct ext4_group_info *grp;
>  	unsigned int overflow;
>  	ext4_grpblk_t bit;
> -	struct buffer_head *gd_bh;
>  	ext4_group_t block_group;
>  	struct ext4_sb_info *sbi;
>  	struct ext4_buddy e4b;
>  	unsigned int count_clusters;
>  	int err = 0;
> -	int ret;
> +	int mark_flags = 0;
>  
>  	sbi = EXT4_SB(sb);
> +	ext4_mb_prepare_mark_context(&mc, handle, sb, 0);
>  
>  	if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
>  	    !ext4_inode_block_valid(inode, block, count)) {
> @@ -6468,18 +6467,6 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  		/* The range changed so it's no longer validated */
>  		flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
>  	}
> -	count_clusters = EXT4_NUM_B2C(sbi, count);
> -	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
> -	if (IS_ERR(bitmap_bh)) {
> -		err = PTR_ERR(bitmap_bh);
> -		bitmap_bh = NULL;
> -		goto error_return;
> -	}
> -	gdp = ext4_get_group_desc(sb, block_group, &gd_bh);
> -	if (!gdp) {
> -		err = -EIO;
> -		goto error_return;
> -	}
>  
>  	if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
>  	    !ext4_inode_block_valid(inode, block, count)) {
> @@ -6489,28 +6476,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  		goto error_return;
>  	}
>  
> -	BUFFER_TRACE(bitmap_bh, "getting write access");
> -	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
> -					    EXT4_JTR_NONE);
> -	if (err)
> -		goto error_return;
> -
> -	/*
> -	 * We are about to modify some metadata.  Call the journal APIs
> -	 * to unshare ->b_data if a currently-committing transaction is
> -	 * using it
> -	 */
> -	BUFFER_TRACE(gd_bh, "get_write_access");
> -	err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE);
> -	if (err)
> -		goto error_return;
> -#ifdef AGGRESSIVE_CHECK
> -	{
> -		int i;
> -		for (i = 0; i < count_clusters; i++)
> -			BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data));
> -	}
> -#endif
> +	count_clusters = EXT4_NUM_B2C(sbi, count);
>  	trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters);
>  
>  	/* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */
> @@ -6519,13 +6485,21 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  	if (err)
>  		goto error_return;
>  
> -	ext4_lock_group(sb, block_group);
> -	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
> -	ext4_free_group_clusters_set(sb, gdp, ret);
> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> -	ext4_group_desc_csum_set(sb, block_group, gdp);
> -	ext4_unlock_group(sb, block_group);
> +#ifdef AGGRESSIVE_CHECK
> +	mark_flags |= EXT4_MB_BITMAP_MARKED_CHECK;
> +#endif
> +	err = ext4_mb_mark_context(&mc, block_group, bit, count_clusters,
> +				   mark_flags);
> +
> +
> +	if (err && mc.changed == 0) {
> +		ext4_mb_unload_buddy(&e4b);
> +		goto error_return;
> +	}
> +
> +#ifdef AGGRESSIVE_CHECK
> +	BUG_ON(mc.changed != count_clusters);
> +#endif
>  
>  	/*
>  	 * We need to make sure we don't reuse the freed block until after the
> @@ -6568,13 +6542,6 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  
>  	ext4_unlock_group(sb, block_group);
>  
> -	if (sbi->s_log_groups_per_flex) {
> -		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
> -		atomic64_add(count_clusters,
> -			     &sbi_array_rcu_deref(sbi, s_flex_groups,
> -						  flex_group)->free_clusters);
> -	}
> -
>  	/*
>  	 * on a bigalloc file system, defer the s_freeclusters_counter
>  	 * update to the caller (ext4_remove_space and friends) so they
> @@ -6589,26 +6556,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  
>  	ext4_mb_unload_buddy(&e4b);
>  
> -	/* We dirtied the bitmap block */
> -	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> -	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> -
> -	/* And the group descriptor block */
> -	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
> -	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
> -	if (!err)
> -		err = ret;
> -

As commented out in previous patch, it would be much simpler if we can
move all the required peices for bitmap freeing logic at one place.
So then this patch just becomes removing them all and using already
created ext4_mb_mark_context() function.

-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 08/11] ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
  2023-08-26 15:50 ` [PATCH v6 08/11] ext4: call ext4_mb_mark_context " Kemeng Shi
@ 2023-09-01  9:50   ` Ritesh Harjani
  0 siblings, 0 replies; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-01  9:50 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Call ext4_mb_mark_context in ext4_group_add_blocks() to remove repeat code.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 84 +++++++----------------------------------------
>  1 file changed, 12 insertions(+), 72 deletions(-)

I think same comments (like in patch-5 & patch-6) hold for this and the previous patch as well where
we are seperating out bitmap free and buddy free from ext4_group_add_blocks. 

-ritesh


>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index bb08a71a6e61..fdffa3b40bcd 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6681,25 +6681,22 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
>  int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
>  			 ext4_fsblk_t block, unsigned long count)
>  {
> -	struct buffer_head *bitmap_bh = NULL;
> -	struct buffer_head *gd_bh;
> +	struct ext4_mark_context mc;
>  	ext4_group_t block_group;
>  	ext4_grpblk_t bit;
> -	unsigned int i;
> -	struct ext4_group_desc *desc;
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>  	struct ext4_buddy e4b;
> -	int err = 0, ret, free_clusters_count;
> -	ext4_grpblk_t clusters_freed;
> +	int err = 0;
>  	ext4_fsblk_t first_cluster = EXT4_B2C(sbi, block);
>  	ext4_fsblk_t last_cluster = EXT4_B2C(sbi, block + count - 1);
>  	unsigned long cluster_count = last_cluster - first_cluster + 1;
>  
>  	ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
>  
> -	if (count == 0)
> +	if (cluster_count == 0)
>  		return 0;
>  
> +	ext4_mb_prepare_mark_context(&mc, handle, sb, 0);
>  	ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
>  	/*
>  	 * Check to see if we are freeing blocks across a group
> @@ -6712,19 +6709,6 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
>  		goto error_return;
>  	}
>  
> -	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
> -	if (IS_ERR(bitmap_bh)) {
> -		err = PTR_ERR(bitmap_bh);
> -		bitmap_bh = NULL;
> -		goto error_return;
> -	}
> -
> -	desc = ext4_get_group_desc(sb, block_group, &gd_bh);
> -	if (!desc) {
> -		err = -EIO;
> -		goto error_return;
> -	}
> -
>  	if (!ext4_sb_block_valid(sb, NULL, block, count)) {
>  		ext4_error(sb, "Adding blocks in system zones - "
>  			   "Block = %llu, count = %lu",
> @@ -6733,74 +6717,30 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
>  		goto error_return;
>  	}
>  
> -	BUFFER_TRACE(bitmap_bh, "getting write access");
> -	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
> -					    EXT4_JTR_NONE);
> +	err = ext4_mb_load_buddy(sb, block_group, &e4b);
>  	if (err)
>  		goto error_return;
>  
> -	/*
> -	 * We are about to modify some metadata.  Call the journal APIs
> -	 * to unshare ->b_data if a currently-committing transaction is
> -	 * using it
> -	 */
> -	BUFFER_TRACE(gd_bh, "get_write_access");
> -	err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE);
> -	if (err)
> +	err = ext4_mb_mark_context(&mc, block_group, bit, cluster_count,
> +				   EXT4_MB_BITMAP_MARKED_CHECK);
> +	if (err && mc.changed == 0) {
> +		ext4_mb_unload_buddy(&e4b);
>  		goto error_return;
> -
> -	for (i = 0, clusters_freed = 0; i < cluster_count; i++) {
> -		BUFFER_TRACE(bitmap_bh, "clear bit");
> -		if (!mb_test_bit(bit + i, bitmap_bh->b_data)) {
> -			ext4_error(sb, "bit already cleared for block %llu",
> -				   (ext4_fsblk_t)(block + i));
> -			BUFFER_TRACE(bitmap_bh, "bit already cleared");
> -		} else {
> -			clusters_freed++;
> -		}
>  	}
>  
> -	err = ext4_mb_load_buddy(sb, block_group, &e4b);
> -	if (err)
> -		goto error_return;
> -
> -	ext4_lock_group(sb, block_group);
> -	mb_clear_bits(bitmap_bh->b_data, bit, cluster_count);
> -	free_clusters_count = clusters_freed +
> -		ext4_free_group_clusters(sb, desc);
> -	ext4_free_group_clusters_set(sb, desc, free_clusters_count);
> -	ext4_block_bitmap_csum_set(sb, desc, bitmap_bh);
> -	ext4_group_desc_csum_set(sb, block_group, desc);
> -	ext4_unlock_group(sb, block_group);
> +	if (mc.changed != cluster_count)
> +		ext4_error(sb, "bit already cleared in group %u", block_group);
>  
>  	ext4_lock_group(sb, block_group);
>  	mb_free_blocks(NULL, &e4b, bit, cluster_count);
>  	ext4_unlock_group(sb, block_group);
>  
>  	percpu_counter_add(&sbi->s_freeclusters_counter,
> -			   clusters_freed);
> -
> -	if (sbi->s_log_groups_per_flex) {
> -		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
> -		atomic64_add(clusters_freed,
> -			     &sbi_array_rcu_deref(sbi, s_flex_groups,
> -						  flex_group)->free_clusters);
> -	}
> +			   mc.changed);
>  
>  	ext4_mb_unload_buddy(&e4b);
>  
> -	/* We dirtied the bitmap block */
> -	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> -	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> -
> -	/* And the group descriptor block */
> -	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
> -	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
> -	if (!err)
> -		err = ret;
> -
>  error_return:
> -	brelse(bitmap_bh);
>  	ext4_std_error(sb, err);
>  	return err;
>  }
> -- 
> 2.30.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 09/11] ext4: add some kunit stub for mballoc kunit test
  2023-08-26 15:50 ` [PATCH v6 09/11] ext4: add some kunit stub for mballoc kunit test Kemeng Shi
@ 2023-09-01 14:18   ` Ritesh Harjani
  0 siblings, 0 replies; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-01 14:18 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Multiblocks allocation will read and write block bitmap and group
> descriptor which reside on disk. Add kunit stub to function
> ext4_get_group_desc, ext4_read_block_bitmap_nowait, ext4_wait_block_bitmap
> and ext4_mb_mark_context to avoid real IO to disk.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/balloc.c  | 10 ++++++++++
>  fs/ext4/mballoc.c |  4 ++++
>  2 files changed, 14 insertions(+)

The patch looks good to me. Nice work in identifying functions which are
does I/O in mballoc context.

Looks good to me. Feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 10/11] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
  2023-08-26 15:50 ` [PATCH v6 10/11] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc Kemeng Shi
@ 2023-09-01 14:29   ` Ritesh Harjani
  0 siblings, 0 replies; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-01 14:29 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Here are prepared work:
> 1. Include mballoc-test.c to mballoc.c to be able test static function
> in mballoc.c.
> 2. Implement static stub to avoid read IO to disk.
> 3. Construct fake super_block. Only partial members are set, more members
> will be set when more functions are tested.
> Then unit test for ext4_mb_new_blocks_simple is added.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc-test.c | 322 +++++++++++++++++++++++++++++++++++++++++
>  fs/ext4/mballoc.c      |   4 +
>  2 files changed, 326 insertions(+)
>  create mode 100644 fs/ext4/mballoc-test.c


Thanks for working on the review comments. The purpose of functions and
structures are much clear now. Also the approach of including
mballoc-test.c within #ifdef macros inside mballoc.c looks ok to me.

Nice work in getting this ready! 
ext4_mb_new_blocks_simple() is a good first start (even though as of now
it is a simple allocator for fast commit replay path)

-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 11/11] ext4: run mballoc test with different layouts setting
  2023-08-26 15:50 ` [PATCH v6 11/11] ext4: run mballoc test with different layouts setting Kemeng Shi
@ 2023-09-01 14:36   ` Ritesh Harjani
  2023-09-04  3:01     ` Kemeng Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-01 14:36 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Use KUNIT_CASE_PARAM to run mbalaloc test with different layouts setting.
                              ^^^ mballoc
small nit below
  
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc-test.c | 52 ++++++++++++++++++++++++++++++------------
>  1 file changed, 38 insertions(+), 14 deletions(-)
>
> diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
> index d643c56ac003..af48a39c8ba2 100644
> --- a/fs/ext4/mballoc-test.c
> +++ b/fs/ext4/mballoc-test.c
> @@ -196,21 +196,11 @@ static int ext4_mb_mark_context_stub(struct ext4_mark_context *mc,
>  	return 0;
>  }
>  
> -#define TEST_BLOCKSIZE_BITS 10
> -#define TEST_CLUSTER_BITS 3
> -#define TEST_BLOCKS_PER_GROUP 8192
> -#define TEST_GROUP_COUNT 4
> -#define TEST_DESC_SIZE 64
>  #define TEST_GOAL_GROUP 1
>  static int mbt_kunit_init(struct kunit *test)
>  {
> -	struct mbt_ext4_block_layout layout = {
> -		.blocksize_bits = TEST_BLOCKSIZE_BITS,
> -		.cluster_bits = TEST_CLUSTER_BITS,
> -		.blocks_per_group = TEST_BLOCKS_PER_GROUP,
> -		.group_count = TEST_GROUP_COUNT,
> -		.desc_size = TEST_DESC_SIZE,
> -	};
> +	struct mbt_ext4_block_layout *layout =
> +		(struct mbt_ext4_block_layout *)(test->param_value);
>  	struct super_block *sb;
>  	int ret;
>  
> @@ -218,7 +208,7 @@ static int mbt_kunit_init(struct kunit *test)
>  	if (sb == NULL)
>  		return -ENOMEM;
>  
> -	mbt_init_sb_layout(sb, &layout);
> +	mbt_init_sb_layout(sb, layout);
>  
>  	ret = mbt_ctx_init(sb);
>  	if (ret != 0) {
> @@ -304,9 +294,43 @@ static void test_new_blocks_simple(struct kunit *test)
>  		"unexpectedly get block when no block is available");
>  }
>  
> +static const struct mbt_ext4_block_layout mbt_test_layouts[] = {
> +	{
> +		.blocksize_bits = 10,
> +		.cluster_bits = 3,
> +		.blocks_per_group = 8192,
> +		.group_count = 4,
> +		.desc_size = 64,
> +	},
> +	{
> +		.blocksize_bits = 12,
> +		.cluster_bits = 3,
> +		.blocks_per_group = 8192,
> +		.group_count = 4,
> +		.desc_size = 64,
> +	},
> +	{
> +		.blocksize_bits = 18,

64k blocksize is more common due to platforms with 64k pagesize like
Power and sometimes arm64. I would rather make it 16 here.

I tested it on Power - 

[    2.546687][    T1] KTAP version 1
[    2.547123][    T1] 1..2
[    2.547447][    T1]     KTAP version 1
[    2.547927][    T1]     # Subtest: ext4_mballoc_test
[    2.548562][    T1]     1..1
[    2.548933][    T1]         KTAP version 1
[    2.549457][    T1]         # Subtest: test_new_blocks_simple
[    2.549550][  T108] kunit_try_catch (108) used greatest stack depth: 14512 bytes left
[    2.549644][    T1]         ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
[    2.552780][  T110] kunit_try_catch (110) used greatest stack depth: 14464 bytes left
[    2.552882][    T1]         ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
[    2.555909][    T1]         ok 3 block_bits=18 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
[    2.557184][    T1]     # test_new_blocks_simple: pass:3 fail:0 skip:0 total:3
[    2.557186][    T1]     ok 1 test_new_blocks_simple
[    2.558083][    T1] # Totals: pass:3 fail:0 skip:0 total:3
[    2.558688][    T1] ok 1 ext4_mballoc_test

Looks good to me. Feel free to add -

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-08-31 14:07       ` Ritesh Harjani
@ 2023-09-04  2:50         ` Kemeng Shi
  2023-09-04  8:30           ` Ritesh Harjani
  0 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-09-04  2:50 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 8/31/2023 10:07 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> on 8/31/2023 8:33 PM, Ritesh Harjani wrote:
>>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>>
>>> Hello Kemeng,
>>>
>>>> There are several reasons to add a general function to update block
>>>> bitmap and group descriptor on disk:
>>>
>>> ... named ext4_mb_mark_context(<params>)
>>>
>>>> 1. pair behavior of alloc/free bits. For example,
>>>> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
>>>> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
>>>> 2. remove repeat code to read from disk, update and write back to disk.
>>>> 3. reduce future unit test mocks to catch real IO to update structure
>>>> on disk.
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>>>> ---
>>>>  fs/ext4/mballoc.c | 169 +++++++++++++++++++++++++++-------------------
>>>>  1 file changed, 99 insertions(+), 70 deletions(-)
>>>>
>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>> index c91db9f57524..e2be572deb75 100644
>>>> --- a/fs/ext4/mballoc.c
>>>> +++ b/fs/ext4/mballoc.c
>>>> @@ -3952,6 +3952,100 @@ void ext4_exit_mballoc(void)
>>>>  	ext4_groupinfo_destroy_slabs();
>>>>  }
>>>>  
>>>> +/*
>>>> + * Collect global setting to reduce the number of variable passing to
>>>> + * ext4_mb_mark_context. Pass target group blocks range directly to
>>>> + * reuse the prepared global setting for multiple block ranges and
>>>> + * to show clearly the specific block range will be marked.
>>>> + */
>>>> +struct ext4_mark_context {
>>>> +	struct super_block *sb;
>>>> +	int state;
>>>> +};
>>>
>>> This structure definition does not reflect of it's naming.
>>> Why can't we also add cblk & clen, flags to it?
>>>
>>> I think the idea of defining a new function named
>>> ext4_mb_prepare_mark_context() was that we can prepare "struct ext4_mark_context"
>>> with different cblk, clen & flags arguments for cases where we might
>>> have to call ext4_mb_mark_context() more than once in the same function
>>> or call ext4_mb_mark_context() anywhere but at the start of the function.
>>>
>>> As I see it in the current series, we are calling
>>> ext4_mb_prepare_mark_context() at the start of every function. Just for
>>> this purpose we don't need an extra function, right? That we can directly do
>>> at the time of declaring a structure variable itself (like how you did
>>> in previous version)
>>>
>> Hi Ritesh, thanks for reply. The ext4_mark_context structure aims to reduce
>> variable passing to ext4_mb_mark_context. If we have to prepare a lot
>> member in ext4_mb_prepare_mark_context, then too many variables issue occurs
>> in ext4_mb_prepare_mark_context.
>> The name of ext4_mark_context maybe not proper. Actually I want a structure
>> to collect information which is not strongly relevant to mark blk bits. In
>> this way, we can initialize them at beginning of function, then there is no
>> need to pay attention to them or to pass them respectively in each call to
>> ext4_mb_mark_context. Instead, we foucus on the useful information called
>> with ext4_mb_mark_context.
>> This design also achive the goal to define ext4_mb_mark_context once for
>> multiple use in the same function as ext4_mark_context unlikely changes
>> after initialization at beginning.
>>> What do you think of the approach where we add cblk, clen & flags
>>> variables to ext4_mark_context()? Do you see any problems/difficulties
>>> with that design?
>>>
>> The providing desgin looks good to me. Please let me konw if you still
>> perfre this and I will do this in next version. Thanks!
>>
> 
> I would have still preferred, the block and len arguments inside struct
> ext4_mark_context, because that better explains the use and definition of
> structure and it's prepare function.
> However, since this is not any functionality change, I am fine if you
> prefer the current design(as you mentioned above).
> We can always discuss & change it later too :) 
> 
Thanks for the reivew. Since more improvement is needed, I would like to
define ext4_mark_context as you suggested in previous version:
    ext4_mark_context {
        ext4_group_t mc_group;          /* block group */
        ext4_grpblk_t mc_clblk;	    /* block in cluster units */
        ext4_grpblk_t mc_cllen;	    /* len in cluster units */
        ext4_grpblk_t mc_clupdates;     /* number of clusters marked/unmarked */
        unsigned int mc_flags;          /* flags ... */
        bool mc_state;                  /* to set or unset state */
    };
And super_block and handle are passed as arguments.

Besides, as we will pass a lot arguments in prepare function anyway. What
about simply passing all arguments to ext4_mb_prepare_mark_context
directly:
static inline void ext4_mb_mark_context(handle_t *handle,
                                                struct super_block *sb,
                                                int state,
                                                ext4_group_t group,
                                                ext4_grpblk_t blkoff,
                                                ext4_grpblk_t len,
                                                int flags,
                                                ext4_grpblk_t *changed)
Look forward to your reply. Thanks!
> Since otherwise the refactoring changes looks good to me.
> Please feel free to add -
> 
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 
> Thanks! 
> -ritesh
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 02/11] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
  2023-08-31 14:25   ` Ritesh Harjani
@ 2023-09-04  2:51     ` Kemeng Shi
  0 siblings, 0 replies; 36+ messages in thread
From: Kemeng Shi @ 2023-09-04  2:51 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 8/31/2023 10:25 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> call ext4_mb_mark_context in ext4_free_blocks_simple to:
>> 1. remove repeat code
>> 2. pair update of free_clusters in ext4_mb_new_blocks_simple.
>> 3. add missing ext4_lock_group/ext4_unlock_group protection.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/ext4/mballoc.c | 35 +++--------------------------------
>>  1 file changed, 3 insertions(+), 32 deletions(-)
> 
> Looks good to me. Please feel free to add - 
> 
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 
> (One small comment below for previous patch)
> 
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e2be572deb75..c803f74aaf63 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -6414,43 +6414,14 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
>>  static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
>>  					unsigned long count)
>>  {
>> -	struct buffer_head *bitmap_bh;
>> +	struct ext4_mark_context mc;
>>  	struct super_block *sb = inode->i_sb;
>> -	struct ext4_group_desc *gdp;
>> -	struct buffer_head *gdp_bh;
>>  	ext4_group_t group;
>>  	ext4_grpblk_t blkoff;
>> -	int already_freed = 0, err, i;
>>  
>> +	ext4_mb_prepare_mark_context(&mc, sb, 0);
> 
> It looks like we always use 0 or 1 as the state for struct
> ext4_mark_context. In that case we can keep state member of this struct
> as bool instead of int. 
Get it. Thanks for pointing it out!
> 
> 
>>  	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
>> -	bitmap_bh = ext4_read_block_bitmap(sb, group);
>> -	if (IS_ERR(bitmap_bh)) {
>> -		pr_warn("Failed to read block bitmap\n");
>> -		return;
>> -	}
>> -	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
>> -	if (!gdp)
>> -		goto err_out;
>> -
>> -	for (i = 0; i < count; i++) {
>> -		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data))
>> -			already_freed++;
>> -	}
>> -	mb_clear_bits(bitmap_bh->b_data, blkoff, count);
>> -	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
>> -	if (err)
>> -		goto err_out;
>> -	ext4_free_group_clusters_set(
>> -		sb, gdp, ext4_free_group_clusters(sb, gdp) +
>> -		count - already_freed);
>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> -	ext4_group_desc_csum_set(sb, group, gdp);
>> -	ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
>> -	sync_dirty_buffer(bitmap_bh);
>> -	sync_dirty_buffer(gdp_bh);
>> -
>> -err_out:
>> -	brelse(bitmap_bh);
>> +	ext4_mb_mark_context(&mc, group, blkoff, count);
>>  }
>>  
>>  /**
>> -- 
>> 2.30.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 04/11] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
  2023-09-01  3:51   ` Ritesh Harjani
@ 2023-09-04  2:54     ` Kemeng Shi
  0 siblings, 0 replies; 36+ messages in thread
From: Kemeng Shi @ 2023-09-04  2:54 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 9/1/2023 11:51 AM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> call ext4_mb_mark_context in ext4_mb_mark_diskspace_used to:
>> 1. remove repeat code to normally update bitmap and group descriptor
>> on disk.
>> 2. call ext4_mb_mark_context instead of only setting bits in block bitmap
>> to fix the bitmap. Function ext4_mb_mark_context will also update
>> checksum of bitmap and other counter along with the bit change to keep
>> the cosistent with bit change or block bitmap will be marked corrupted as
>> checksum of bitmap is in inconsistent state.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>> ---
>>  fs/ext4/mballoc.c | 86 +++++++++++------------------------------------
>>  1 file changed, 20 insertions(+), 66 deletions(-)
> 
> I was wondering whether checking for !ext4_inode_block_valid() can also
> be part of ext4_mb_mark_context() by passing EXT4_MB_METABLOCKS_VALID_CHECK
> flag.
Looks great to me. Thanks for the suggestion, I will do this in next version.
> 
> But as for this patch. It looks good to me. Please feel free to add - 
> 
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 
> -ritesh
> 
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index b066ee018cdb..e650eac22237 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -4084,46 +4084,28 @@ static noinline_for_stack int
>>  ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>>  				handle_t *handle, unsigned int reserv_clstrs)
>>  {
>> -	struct buffer_head *bitmap_bh = NULL;
>> +	struct ext4_mark_context mc;
>>  	struct ext4_group_desc *gdp;
>> -	struct buffer_head *gdp_bh;
>>  	struct ext4_sb_info *sbi;
>>  	struct super_block *sb;
>>  	ext4_fsblk_t block;
>>  	int err, len;
>> +	int flags = 0;
>>  
>>  	BUG_ON(ac->ac_status != AC_STATUS_FOUND);
>>  	BUG_ON(ac->ac_b_ex.fe_len <= 0);
>>  
>>  	sb = ac->ac_sb;
>>  	sbi = EXT4_SB(sb);
>> +	ext4_mb_prepare_mark_context(&mc, handle, sb, 1);
>>  
>> -	bitmap_bh = ext4_read_block_bitmap(sb, ac->ac_b_ex.fe_group);
>> -	if (IS_ERR(bitmap_bh)) {
>> -		return PTR_ERR(bitmap_bh);
>> -	}
>> -
>> -	BUFFER_TRACE(bitmap_bh, "getting write access");
>> -	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
>> -					    EXT4_JTR_NONE);
>> -	if (err)
>> -		goto out_err;
>> -
>> -	err = -EIO;
>> -	gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, &gdp_bh);
>> +	gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, NULL);
>>  	if (!gdp)
>> -		goto out_err;
>> -
>> +		return -EIO;
>>  	ext4_debug("using block group %u(%d)\n", ac->ac_b_ex.fe_group,
>>  			ext4_free_group_clusters(sb, gdp));
>>  
>> -	BUFFER_TRACE(gdp_bh, "get_write_access");
>> -	err = ext4_journal_get_write_access(handle, sb, gdp_bh, EXT4_JTR_NONE);
>> -	if (err)
>> -		goto out_err;
>> -
>>  	block = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
>> -
>>  	len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
>>  	if (!ext4_inode_block_valid(ac->ac_inode, block, len)) {
>>  		ext4_error(sb, "Allocating blocks %llu-%llu which overlap "
>> @@ -4132,41 +4114,28 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>>  		 * Fix the bitmap and return EFSCORRUPTED
>>  		 * We leak some of the blocks here.
>>  		 */
>> -		ext4_lock_group(sb, ac->ac_b_ex.fe_group);
>> -		mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
>> -			      ac->ac_b_ex.fe_len);
>> -		ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
>> -		err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>> +		err = ext4_mb_mark_context(&mc, ac->ac_b_ex.fe_group,
>> +					   ac->ac_b_ex.fe_start,
>> +					   ac->ac_b_ex.fe_len,
>> +					   0);
>>  		if (!err)
>>  			err = -EFSCORRUPTED;
>> -		goto out_err;
>> +		return err;
>>  	}
>>  
>> -	ext4_lock_group(sb, ac->ac_b_ex.fe_group);
>>  #ifdef AGGRESSIVE_CHECK
>> -	{
>> -		int i;
>> -		for (i = 0; i < ac->ac_b_ex.fe_len; i++) {
>> -			BUG_ON(mb_test_bit(ac->ac_b_ex.fe_start + i,
>> -						bitmap_bh->b_data));
>> -		}
>> -	}
>> +	flags |= EXT4_MB_BITMAP_MARKED_CHECK;
>>  #endif
>> -	mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
>> -		      ac->ac_b_ex.fe_len);
>> -	if (ext4_has_group_desc_csum(sb) &&
>> -	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
>> -		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
>> -		ext4_free_group_clusters_set(sb, gdp,
>> -					     ext4_free_clusters_after_init(sb,
>> -						ac->ac_b_ex.fe_group, gdp));
>> -	}
>> -	len = ext4_free_group_clusters(sb, gdp) - ac->ac_b_ex.fe_len;
>> -	ext4_free_group_clusters_set(sb, gdp, len);
>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> -	ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp);
>> +	err = ext4_mb_mark_context(&mc, ac->ac_b_ex.fe_group,
>> +				   ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len,
>> +				   flags);
>> +
>> +	if (err && mc.changed == 0)
>> +		return err;
>>  
>> -	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
>> +#ifdef AGGRESSIVE_CHECK
>> +	BUG_ON(mc.changed != ac->ac_b_ex.fe_len);
>> +#endif
>>  	percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
>>  	/*
>>  	 * Now reduce the dirty block count also. Should not go negative
>> @@ -4176,21 +4145,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>>  		percpu_counter_sub(&sbi->s_dirtyclusters_counter,
>>  				   reserv_clstrs);
>>  
>> -	if (sbi->s_log_groups_per_flex) {
>> -		ext4_group_t flex_group = ext4_flex_group(sbi,
>> -							  ac->ac_b_ex.fe_group);
>> -		atomic64_sub(ac->ac_b_ex.fe_len,
>> -			     &sbi_array_rcu_deref(sbi, s_flex_groups,
>> -						  flex_group)->free_clusters);
>> -	}
>> -
>> -	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>> -	if (err)
>> -		goto out_err;
>> -	err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh);
>> -
>> -out_err:
>> -	brelse(bitmap_bh);
>>  	return err;
>>  }
>>  
>> -- 
>> 2.30.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  2023-09-01  9:34   ` Ritesh Harjani
@ 2023-09-04  3:00     ` Kemeng Shi
  2023-09-12  7:02       ` Kemeng Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-09-04  3:00 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> This patch separates block bitmap and buddy bitmap freeing in order to
>> udpate block bitmap with ext4_mb_mark_context in following patch.
> ^^^ update.
> 
>>
>> Separated freeing is safe with concurrent allocation as long as:
>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>> Then freed block will only be available to allocation when both buddy
>> bitmap and block bitmap are updated by freeing.
>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
> 
> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
> bitmap right. Continue below...
> 
>>
>> Separated freeing has no race with generate_buddy as:
>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>> initialization of the buddy page will be executed concurrently until
>> buddy page is unloaded. As we always do free in "load buddy, free,
>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e650eac22237..01ad36a1cc96 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  	if (err)
>>  		goto error_return;
> 
> 
> Let me add the a piece of code before the added changes and continue...
> 
> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
> 				     GFP_NOFS|__GFP_NOFAIL);
> 	if (err)
> 		goto error_return;
>>  
>> +	ext4_lock_group(sb, block_group);
>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>> +	ext4_unlock_group(sb, block_group);
>> +
> 
> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
> freeing on-disk bitmap blocks? Can you explain why please?
> At least it is not very clear to me that why do we need
> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
> needed then I think we should separate out bitmap freeing logic and
> buddy freeing logic even further.
Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
it before bit clearing for catching error and aborting mark early
to reduce functional change.
> 
> Thoughts?
> 
>>  	/*
>>  	 * We need to make sure we don't reuse the freed block until after the
>>  	 * transaction is committed. We make an exception if the inode is to be
>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>  
>>  		ext4_lock_group(sb, block_group);
>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>  	} else {
>> -		/* need to update group_info->bb_free and bitmap
>> -		 * with group lock held. generate_buddy look at
>> -		 * them with group lock_held
>> -		 */
>>  		if (test_opt(sb, DISCARD)) {
>>  			err = ext4_issue_discard(sb, block_group, bit,
>>  						 count_clusters, NULL);
>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>  
>>  		ext4_lock_group(sb, block_group);
>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>  	}
>>  
>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>  	ext4_unlock_group(sb, block_group);
>>  
>>  	if (sbi->s_log_groups_per_flex) {
> 
> 
> Adding piece of code here...
> 
> 	if (sbi->s_log_groups_per_flex) {
> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
> 		atomic64_add(count_clusters,
> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
> 						  flex_group)->free_clusters);
> 	}
> 
> <...>
> 
> 	/* We dirtied the bitmap block */
> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> 	/* And the group descriptor block */
> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
> 	if (!err)
> 		err = ret;
> 
> I was thinking even this can go around bitmap freeing logic above. So
> the next patch becomes very clear that all of the bitmap freeing logic
> is just simply moved into ext4_mb_mark_context() function.
> 
> -ritesh
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 11/11] ext4: run mballoc test with different layouts setting
  2023-09-01 14:36   ` Ritesh Harjani
@ 2023-09-04  3:01     ` Kemeng Shi
  0 siblings, 0 replies; 36+ messages in thread
From: Kemeng Shi @ 2023-09-04  3:01 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 9/1/2023 10:36 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> Use KUNIT_CASE_PARAM to run mbalaloc test with different layouts setting.
>                               ^^^ mballoc
> small nit below
>   
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/ext4/mballoc-test.c | 52 ++++++++++++++++++++++++++++++------------
>>  1 file changed, 38 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
>> index d643c56ac003..af48a39c8ba2 100644
>> --- a/fs/ext4/mballoc-test.c
>> +++ b/fs/ext4/mballoc-test.c
>> @@ -196,21 +196,11 @@ static int ext4_mb_mark_context_stub(struct ext4_mark_context *mc,
>>  	return 0;
>>  }
>>  
>> -#define TEST_BLOCKSIZE_BITS 10
>> -#define TEST_CLUSTER_BITS 3
>> -#define TEST_BLOCKS_PER_GROUP 8192
>> -#define TEST_GROUP_COUNT 4
>> -#define TEST_DESC_SIZE 64
>>  #define TEST_GOAL_GROUP 1
>>  static int mbt_kunit_init(struct kunit *test)
>>  {
>> -	struct mbt_ext4_block_layout layout = {
>> -		.blocksize_bits = TEST_BLOCKSIZE_BITS,
>> -		.cluster_bits = TEST_CLUSTER_BITS,
>> -		.blocks_per_group = TEST_BLOCKS_PER_GROUP,
>> -		.group_count = TEST_GROUP_COUNT,
>> -		.desc_size = TEST_DESC_SIZE,
>> -	};
>> +	struct mbt_ext4_block_layout *layout =
>> +		(struct mbt_ext4_block_layout *)(test->param_value);
>>  	struct super_block *sb;
>>  	int ret;
>>  
>> @@ -218,7 +208,7 @@ static int mbt_kunit_init(struct kunit *test)
>>  	if (sb == NULL)
>>  		return -ENOMEM;
>>  
>> -	mbt_init_sb_layout(sb, &layout);
>> +	mbt_init_sb_layout(sb, layout);
>>  
>>  	ret = mbt_ctx_init(sb);
>>  	if (ret != 0) {
>> @@ -304,9 +294,43 @@ static void test_new_blocks_simple(struct kunit *test)
>>  		"unexpectedly get block when no block is available");
>>  }
>>  
>> +static const struct mbt_ext4_block_layout mbt_test_layouts[] = {
>> +	{
>> +		.blocksize_bits = 10,
>> +		.cluster_bits = 3,
>> +		.blocks_per_group = 8192,
>> +		.group_count = 4,
>> +		.desc_size = 64,
>> +	},
>> +	{
>> +		.blocksize_bits = 12,
>> +		.cluster_bits = 3,
>> +		.blocks_per_group = 8192,
>> +		.group_count = 4,
>> +		.desc_size = 64,
>> +	},
>> +	{
>> +		.blocksize_bits = 18,
> 
> 64k blocksize is more common due to platforms with 64k pagesize like
> Power and sometimes arm64. I would rather make it 16 here.
> 
> I tested it on Power - 
Sure, I will make it 16 in next version. Thanks!

> 
> [    2.546687][    T1] KTAP version 1
> [    2.547123][    T1] 1..2
> [    2.547447][    T1]     KTAP version 1
> [    2.547927][    T1]     # Subtest: ext4_mballoc_test
> [    2.548562][    T1]     1..1
> [    2.548933][    T1]         KTAP version 1
> [    2.549457][    T1]         # Subtest: test_new_blocks_simple
> [    2.549550][  T108] kunit_try_catch (108) used greatest stack depth: 14512 bytes left
> [    2.549644][    T1]         ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
> [    2.552780][  T110] kunit_try_catch (110) used greatest stack depth: 14464 bytes left
> [    2.552882][    T1]         ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
> [    2.555909][    T1]         ok 3 block_bits=18 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
> [    2.557184][    T1]     # test_new_blocks_simple: pass:3 fail:0 skip:0 total:3
> [    2.557186][    T1]     ok 1 test_new_blocks_simple
> [    2.558083][    T1] # Totals: pass:3 fail:0 skip:0 total:3
> [    2.558688][    T1] ok 1 ext4_mballoc_test
> 
> Looks good to me. Feel free to add -
> 
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-09-04  2:50         ` Kemeng Shi
@ 2023-09-04  8:30           ` Ritesh Harjani
  0 siblings, 0 replies; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-04  8:30 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> on 8/31/2023 10:07 PM, Ritesh Harjani wrote:
>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>> 
>>> on 8/31/2023 8:33 PM, Ritesh Harjani wrote:
>>>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>>>
>>>> Hello Kemeng,
>>>>
>>>>> There are several reasons to add a general function to update block
>>>>> bitmap and group descriptor on disk:
>>>>
>>>> ... named ext4_mb_mark_context(<params>)
>>>>
>>>>> 1. pair behavior of alloc/free bits. For example,
>>>>> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
>>>>> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
>>>>> 2. remove repeat code to read from disk, update and write back to disk.
>>>>> 3. reduce future unit test mocks to catch real IO to update structure
>>>>> on disk.
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>>>>> ---
>>>>>  fs/ext4/mballoc.c | 169 +++++++++++++++++++++++++++-------------------
>>>>>  1 file changed, 99 insertions(+), 70 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>>> index c91db9f57524..e2be572deb75 100644
>>>>> --- a/fs/ext4/mballoc.c
>>>>> +++ b/fs/ext4/mballoc.c
>>>>> @@ -3952,6 +3952,100 @@ void ext4_exit_mballoc(void)
>>>>>  	ext4_groupinfo_destroy_slabs();
>>>>>  }
>>>>>  
>>>>> +/*
>>>>> + * Collect global setting to reduce the number of variable passing to
>>>>> + * ext4_mb_mark_context. Pass target group blocks range directly to
>>>>> + * reuse the prepared global setting for multiple block ranges and
>>>>> + * to show clearly the specific block range will be marked.
>>>>> + */
>>>>> +struct ext4_mark_context {
>>>>> +	struct super_block *sb;
>>>>> +	int state;
>>>>> +};
>>>>
>>>> This structure definition does not reflect of it's naming.
>>>> Why can't we also add cblk & clen, flags to it?
>>>>
>>>> I think the idea of defining a new function named
>>>> ext4_mb_prepare_mark_context() was that we can prepare "struct ext4_mark_context"
>>>> with different cblk, clen & flags arguments for cases where we might
>>>> have to call ext4_mb_mark_context() more than once in the same function
>>>> or call ext4_mb_mark_context() anywhere but at the start of the function.
>>>>
>>>> As I see it in the current series, we are calling
>>>> ext4_mb_prepare_mark_context() at the start of every function. Just for
>>>> this purpose we don't need an extra function, right? That we can directly do
>>>> at the time of declaring a structure variable itself (like how you did
>>>> in previous version)
>>>>
>>> Hi Ritesh, thanks for reply. The ext4_mark_context structure aims to reduce
>>> variable passing to ext4_mb_mark_context. If we have to prepare a lot
>>> member in ext4_mb_prepare_mark_context, then too many variables issue occurs
>>> in ext4_mb_prepare_mark_context.
>>> The name of ext4_mark_context maybe not proper. Actually I want a structure
>>> to collect information which is not strongly relevant to mark blk bits. In
>>> this way, we can initialize them at beginning of function, then there is no
>>> need to pay attention to them or to pass them respectively in each call to
>>> ext4_mb_mark_context. Instead, we foucus on the useful information called
>>> with ext4_mb_mark_context.
>>> This design also achive the goal to define ext4_mb_mark_context once for
>>> multiple use in the same function as ext4_mark_context unlikely changes
>>> after initialization at beginning.
>>>> What do you think of the approach where we add cblk, clen & flags
>>>> variables to ext4_mark_context()? Do you see any problems/difficulties
>>>> with that design?
>>>>
>>> The providing desgin looks good to me. Please let me konw if you still
>>> perfre this and I will do this in next version. Thanks!
>>>
>> 
>> I would have still preferred, the block and len arguments inside struct
>> ext4_mark_context, because that better explains the use and definition of
>> structure and it's prepare function.
>> However, since this is not any functionality change, I am fine if you
>> prefer the current design(as you mentioned above).
>> We can always discuss & change it later too :) 
>> 
> Thanks for the reivew. Since more improvement is needed, I would like to
> define ext4_mark_context as you suggested in previous version:
>     ext4_mark_context {
>         ext4_group_t mc_group;          /* block group */
>         ext4_grpblk_t mc_clblk;	    /* block in cluster units */
>         ext4_grpblk_t mc_cllen;	    /* len in cluster units */
>         ext4_grpblk_t mc_clupdates;     /* number of clusters marked/unmarked */
>         unsigned int mc_flags;          /* flags ... */
>         bool mc_state;                  /* to set or unset state */
>     };
> And super_block and handle are passed as arguments.
>
> Besides, as we will pass a lot arguments in prepare function anyway. What
> about simply passing all arguments to ext4_mb_prepare_mark_context
> directly:
> static inline void ext4_mb_mark_context(handle_t *handle,
>                                                 struct super_block *sb,
>                                                 int state,
>                                                 ext4_group_t group,
>                                                 ext4_grpblk_t blkoff,
>                                                 ext4_grpblk_t len,
>                                                 int flags,
>                                                 ext4_grpblk_t *changed)
> Look forward to your reply. Thanks!

Sounds good to me. Thanks

-ritesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  2023-09-04  3:00     ` Kemeng Shi
@ 2023-09-12  7:02       ` Kemeng Shi
  2023-09-12 10:13         ` Ritesh Harjani
  0 siblings, 1 reply; 36+ messages in thread
From: Kemeng Shi @ 2023-09-12  7:02 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 9/4/2023 11:00 AM, Kemeng Shi wrote:
> 
> 
> on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>
>>> This patch separates block bitmap and buddy bitmap freeing in order to
>>> udpate block bitmap with ext4_mb_mark_context in following patch.
>> ^^^ update.
>>
>>>
>>> Separated freeing is safe with concurrent allocation as long as:
>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>>> Then freed block will only be available to allocation when both buddy
>>> bitmap and block bitmap are updated by freeing.
>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>>
>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
>> bitmap right. Continue below...
>>
>>>
>>> Separated freeing has no race with generate_buddy as:
>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>>> initialization of the buddy page will be executed concurrently until
>>> buddy page is unloaded. As we always do free in "load buddy, free,
>>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index e650eac22237..01ad36a1cc96 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>  	if (err)
>>>  		goto error_return;
>>
>>
>> Let me add the a piece of code before the added changes and continue...
>>
>> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
>> 				     GFP_NOFS|__GFP_NOFAIL);
>> 	if (err)
>> 		goto error_return;
>>>  
>>> +	ext4_lock_group(sb, block_group);
>>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>>> +	ext4_unlock_group(sb, block_group);
>>> +
>>
>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
>> freeing on-disk bitmap blocks? Can you explain why please?
>> At least it is not very clear to me that why do we need
>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
>> needed then I think we should separate out bitmap freeing logic and
>> buddy freeing logic even further.
> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
> it before bit clearing for catching error and aborting mark early
> to reduce functional change.
Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer
to call ext4_mb_load_buddy_gfp early to catch potential error in advance.
Just wonder is this good to you. Like to hear any advice. Thanks!
>>
>> Thoughts?
>>
>>>  	/*
>>>  	 * We need to make sure we don't reuse the freed block until after the
>>>  	 * transaction is committed. We make an exception if the inode is to be
>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>>  
>>>  		ext4_lock_group(sb, block_group);
>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>>  	} else {
>>> -		/* need to update group_info->bb_free and bitmap
>>> -		 * with group lock held. generate_buddy look at
>>> -		 * them with group lock_held
>>> -		 */
>>>  		if (test_opt(sb, DISCARD)) {
>>>  			err = ext4_issue_discard(sb, block_group, bit,
>>>  						 count_clusters, NULL);
>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>>  
>>>  		ext4_lock_group(sb, block_group);
>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>>  	}
>>>  
>>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>  	ext4_unlock_group(sb, block_group);
>>>  
>>>  	if (sbi->s_log_groups_per_flex) {
>>
>>
>> Adding piece of code here...
>>
>> 	if (sbi->s_log_groups_per_flex) {
>> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
>> 		atomic64_add(count_clusters,
>> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
>> 						  flex_group)->free_clusters);
>> 	}
>>
>> <...>
>>
>> 	/* We dirtied the bitmap block */
>> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
>> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>> 	/* And the group descriptor block */
>> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
>> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
>> 	if (!err)
>> 		err = ret;
>>
>> I was thinking even this can go around bitmap freeing logic above. So
>> the next patch becomes very clear that all of the bitmap freeing logic
>> is just simply moved into ext4_mb_mark_context() function.
>>
>> -ritesh
>>
> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  2023-09-12  7:02       ` Kemeng Shi
@ 2023-09-12 10:13         ` Ritesh Harjani
  2023-09-12 11:32           ` Kemeng Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Ritesh Harjani @ 2023-09-12 10:13 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> on 9/4/2023 11:00 AM, Kemeng Shi wrote:
>> 
>> 
>> on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
>>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>>
>>>> This patch separates block bitmap and buddy bitmap freeing in order to
>>>> udpate block bitmap with ext4_mb_mark_context in following patch.
>>> ^^^ update.
>>>
>>>>
>>>> Separated freeing is safe with concurrent allocation as long as:
>>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>>>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>>>> Then freed block will only be available to allocation when both buddy
>>>> bitmap and block bitmap are updated by freeing.
>>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>>>
>>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
>>> bitmap right. Continue below...
>>>
>>>>
>>>> Separated freeing has no race with generate_buddy as:
>>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>>>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>>>> initialization of the buddy page will be executed concurrently until
>>>> buddy page is unloaded. As we always do free in "load buddy, free,
>>>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>> index e650eac22237..01ad36a1cc96 100644
>>>> --- a/fs/ext4/mballoc.c
>>>> +++ b/fs/ext4/mballoc.c
>>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>  	if (err)
>>>>  		goto error_return;
>>>
>>>
>>> Let me add the a piece of code before the added changes and continue...
>>>
>>> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
>>> 				     GFP_NOFS|__GFP_NOFAIL);
>>> 	if (err)
>>> 		goto error_return;
>>>>  
>>>> +	ext4_lock_group(sb, block_group);
>>>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>>>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>> +	ext4_unlock_group(sb, block_group);
>>>> +
>>>
>>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
>>> freeing on-disk bitmap blocks? Can you explain why please?
>>> At least it is not very clear to me that why do we need
>>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
>>> needed then I think we should separate out bitmap freeing logic and
>>> buddy freeing logic even further.
>> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put

Earlier you mentioned here that there is no dependency.


>> it before bit clearing for catching error and aborting mark early
>> to reduce functional change.
> Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer
> to call ext4_mb_load_buddy_gfp early to catch potential error in advance.
> Just wonder is this good to you. Like to hear any advice. Thanks!

Isn't there actually a dependency that we should first always call
ext4_mb_load_buddy_gfp() before clearing the bitmap? 

Because say if we take care of the bitmap handling first, i.e. we clear
the bitmap and also call ext4_handle_dirty_metadata() first. Now assume
we later call ext4_mb_load_buddy_gfp(). Now also assume the buddy pages
were not already loaded in memory (reclaimed due to memory pressure),
in that case we will read the block bitmap of that group from the
on-disk bitmap copy, which we already changed above.
So that means the buddy handling logic will already find the block
bitmap as cleared and should report problems right?

Then isn't there an actual dependency here, meaning
ext4_mb_load_buddy_gfp() should always be called first before bitmap
handling logic. Thoughts?

-ritesh


>>>
>>> Thoughts?
>>>
>>>>  	/*
>>>>  	 * We need to make sure we don't reuse the freed block until after the
>>>>  	 * transaction is committed. We make an exception if the inode is to be
>>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>>>  
>>>>  		ext4_lock_group(sb, block_group);
>>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>>>  	} else {
>>>> -		/* need to update group_info->bb_free and bitmap
>>>> -		 * with group lock held. generate_buddy look at
>>>> -		 * them with group lock_held
>>>> -		 */
>>>>  		if (test_opt(sb, DISCARD)) {
>>>>  			err = ext4_issue_discard(sb, block_group, bit,
>>>>  						 count_clusters, NULL);
>>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>>>  
>>>>  		ext4_lock_group(sb, block_group);
>>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>>>  	}
>>>>  
>>>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>>>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>>  	ext4_unlock_group(sb, block_group);
>>>>  
>>>>  	if (sbi->s_log_groups_per_flex) {
>>>
>>>
>>> Adding piece of code here...
>>>
>>> 	if (sbi->s_log_groups_per_flex) {
>>> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
>>> 		atomic64_add(count_clusters,
>>> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
>>> 						  flex_group)->free_clusters);
>>> 	}
>>>
>>> <...>
>>>
>>> 	/* We dirtied the bitmap block */
>>> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
>>> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>>> 	/* And the group descriptor block */
>>> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
>>> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
>>> 	if (!err)
>>> 		err = ret;
>>>
>>> I was thinking even this can go around bitmap freeing logic above. So
>>> the next patch becomes very clear that all of the bitmap freeing logic
>>> is just simply moved into ext4_mb_mark_context() function.
>>>
>>> -ritesh
>>>
>> 
>> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  2023-09-12 10:13         ` Ritesh Harjani
@ 2023-09-12 11:32           ` Kemeng Shi
  0 siblings, 0 replies; 36+ messages in thread
From: Kemeng Shi @ 2023-09-12 11:32 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel, linux-ext4, linux-kernel



on 9/12/2023 6:13 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> on 9/4/2023 11:00 AM, Kemeng Shi wrote:
>>>
>>>
>>> on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
>>>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>>>
>>>>> This patch separates block bitmap and buddy bitmap freeing in order to
>>>>> udpate block bitmap with ext4_mb_mark_context in following patch.
>>>> ^^^ update.
>>>>
>>>>>
>>>>> Separated freeing is safe with concurrent allocation as long as:
>>>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>>>>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>>>>> Then freed block will only be available to allocation when both buddy
>>>>> bitmap and block bitmap are updated by freeing.
>>>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>>>>
>>>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
>>>> bitmap right. Continue below...
>>>>
>>>>>
>>>>> Separated freeing has no race with generate_buddy as:
>>>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>>>>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>>>>> initialization of the buddy page will be executed concurrently until
>>>>> buddy page is unloaded. As we always do free in "load buddy, free,
>>>>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>>>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>>> index e650eac22237..01ad36a1cc96 100644
>>>>> --- a/fs/ext4/mballoc.c
>>>>> +++ b/fs/ext4/mballoc.c
>>>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>>  	if (err)
>>>>>  		goto error_return;
>>>>
>>>>
>>>> Let me add the a piece of code before the added changes and continue...
>>>>
>>>> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
>>>> 				     GFP_NOFS|__GFP_NOFAIL);
>>>> 	if (err)
>>>> 		goto error_return;
>>>>>  
>>>>> +	ext4_lock_group(sb, block_group);
>>>>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>>>>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>>> +	ext4_unlock_group(sb, block_group);
>>>>> +
>>>>
>>>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
>>>> freeing on-disk bitmap blocks? Can you explain why please?
>>>> At least it is not very clear to me that why do we need
>>>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
>>>> needed then I think we should separate out bitmap freeing logic and
>>>> buddy freeing logic even further.
>>> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
> 
> Earlier you mentioned here that there is no dependency.
> 
> 
>>> it before bit clearing for catching error and aborting mark early
>>> to reduce functional change.
>> Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer
>> to call ext4_mb_load_buddy_gfp early to catch potential error in advance.
>> Just wonder is this good to you. Like to hear any advice. Thanks!
> 
> Isn't there actually a dependency that we should first always call
> ext4_mb_load_buddy_gfp() before clearing the bitmap? 
> 
> Because say if we take care of the bitmap handling first, i.e. we clear
> the bitmap and also call ext4_handle_dirty_metadata() first. Now assume
> we later call ext4_mb_load_buddy_gfp(). Now also assume the buddy pages
> were not already loaded in memory (reclaimed due to memory pressure),
> in that case we will read the block bitmap of that group from the
> on-disk bitmap copy, which we already changed above.
> So that means the buddy handling logic will already find the block
> bitmap as cleared and should report problems right?
> 
> Then isn't there an actual dependency here, meaning
> ext4_mb_load_buddy_gfp() should always be called first before bitmap
> handling logic. Thoughts?
> 
My fault, ext4_mb_load_buddy_gfp should be called before on-disk bitmap
handling if buddy page was not loaded...

> -ritesh
> 
> 
>>>>
>>>> Thoughts?
>>>>
>>>>>  	/*
>>>>>  	 * We need to make sure we don't reuse the freed block until after the
>>>>>  	 * transaction is committed. We make an exception if the inode is to be
>>>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>>>>  
>>>>>  		ext4_lock_group(sb, block_group);
>>>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>>>>  	} else {
>>>>> -		/* need to update group_info->bb_free and bitmap
>>>>> -		 * with group lock held. generate_buddy look at
>>>>> -		 * them with group lock_held
>>>>> -		 */
>>>>>  		if (test_opt(sb, DISCARD)) {
>>>>>  			err = ext4_issue_discard(sb, block_group, bit,
>>>>>  						 count_clusters, NULL);
>>>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>>>>  
>>>>>  		ext4_lock_group(sb, block_group);
>>>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>>>>  	}
>>>>>  
>>>>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>>>>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>>>  	ext4_unlock_group(sb, block_group);
>>>>>  
>>>>>  	if (sbi->s_log_groups_per_flex) {
>>>>
>>>>
>>>> Adding piece of code here...
>>>>
>>>> 	if (sbi->s_log_groups_per_flex) {
>>>> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
>>>> 		atomic64_add(count_clusters,
>>>> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
>>>> 						  flex_group)->free_clusters);
>>>> 	}
>>>>
>>>> <...>
>>>>
>>>> 	/* We dirtied the bitmap block */
>>>> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
>>>> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>>>> 	/* And the group descriptor block */
>>>> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
>>>> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
>>>> 	if (!err)
>>>> 		err = ret;
>>>>
>>>> I was thinking even this can go around bitmap freeing logic above. So
>>>> the next patch becomes very clear that all of the bitmap freeing logic
>>>> is just simply moved into ext4_mb_mark_context() function.
>>>>
>>>> -ritesh
>>>>
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-09-12 11:32 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-26 15:50 [PATCH v6 00/11] cleanups and unit test for mballoc Kemeng Shi
2023-08-26 15:50 ` [PATCH v6 01/11] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
2023-08-31 12:33   ` Ritesh Harjani
2023-08-31 13:42     ` Kemeng Shi
2023-08-31 14:07       ` Ritesh Harjani
2023-09-04  2:50         ` Kemeng Shi
2023-09-04  8:30           ` Ritesh Harjani
2023-08-26 15:50 ` [PATCH v6 02/11] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple Kemeng Shi
2023-08-31 14:25   ` Ritesh Harjani
2023-09-04  2:51     ` Kemeng Shi
2023-08-26 15:50 ` [PATCH v6 03/11] ext4: extent ext4_mb_mark_context to support allocation under journal Kemeng Shi
2023-08-31 15:51   ` Ritesh Harjani
2023-08-26 15:50 ` [PATCH v6 04/11] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used Kemeng Shi
2023-09-01  3:51   ` Ritesh Harjani
2023-09-04  2:54     ` Kemeng Shi
2023-08-26 15:50 ` [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Kemeng Shi
2023-09-01  9:34   ` Ritesh Harjani
2023-09-04  3:00     ` Kemeng Shi
2023-09-12  7:02       ` Kemeng Shi
2023-09-12 10:13         ` Ritesh Harjani
2023-09-12 11:32           ` Kemeng Shi
2023-08-26 15:50 ` [PATCH v6 06/11] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb Kemeng Shi
2023-09-01  9:38   ` Ritesh Harjani
2023-08-26 15:50 ` [PATCH v6 07/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() Kemeng Shi
2023-08-26 15:50 ` [PATCH v6 08/11] ext4: call ext4_mb_mark_context " Kemeng Shi
2023-09-01  9:50   ` Ritesh Harjani
2023-08-26 15:50 ` [PATCH v6 09/11] ext4: add some kunit stub for mballoc kunit test Kemeng Shi
2023-09-01 14:18   ` Ritesh Harjani
2023-08-26 15:50 ` [PATCH v6 10/11] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc Kemeng Shi
2023-09-01 14:29   ` Ritesh Harjani
2023-08-26 15:50 ` [PATCH v6 11/11] ext4: run mballoc test with different layouts setting Kemeng Shi
2023-09-01 14:36   ` Ritesh Harjani
2023-09-04  3:01     ` Kemeng Shi
2023-08-29 19:02 ` [PATCH v6 00/11] cleanups and unit test for mballoc Ritesh Harjani
2023-08-30  7:22   ` Kemeng Shi
2023-08-31 14:35     ` Ritesh Harjani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).