* [PATCH v5 01/10] ext4: factor out a common helper to query extent map
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
@ 2024-05-17 12:39 ` Zhang Yi
2024-05-17 16:19 ` Markus Elfring
2024-05-17 12:39 ` [PATCH v5 02/10] ext4: check the extent status again before inserting delalloc block Zhang Yi
` (9 subsequent siblings)
10 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:39 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
Factor out a new common helper ext4_map_query_blocks() from the
ext4_da_map_blocks(), it query and return the extent map status on the
inode's extent path, no logic changes.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
fs/ext4/inode.c | 57 +++++++++++++++++++++++++++----------------------
1 file changed, 32 insertions(+), 25 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 537803250ca9..6a41172c06e1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -453,6 +453,35 @@ static void ext4_map_blocks_es_recheck(handle_t *handle,
}
#endif /* ES_AGGRESSIVE_TEST */
+static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
+ struct ext4_map_blocks *map)
+{
+ unsigned int status;
+ int retval;
+
+ if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
+ retval = ext4_ext_map_blocks(handle, inode, map, 0);
+ else
+ retval = ext4_ind_map_blocks(handle, inode, map, 0);
+
+ if (retval <= 0)
+ return retval;
+
+ if (unlikely(retval != map->m_len)) {
+ ext4_warning(inode->i_sb,
+ "ES len assertion failed for inode "
+ "%lu: retval %d != map->m_len %d",
+ inode->i_ino, retval, map->m_len);
+ WARN_ON(1);
+ }
+
+ status = map->m_flags & EXT4_MAP_UNWRITTEN ?
+ EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
+ ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
+ map->m_pblk, status);
+ return retval;
+}
+
/*
* The ext4_map_blocks() function tries to look up the requested blocks,
* and returns if the blocks are already mapped.
@@ -1744,33 +1773,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
down_read(&EXT4_I(inode)->i_data_sem);
if (ext4_has_inline_data(inode))
retval = 0;
- else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
- retval = ext4_ext_map_blocks(NULL, inode, map, 0);
else
- retval = ext4_ind_map_blocks(NULL, inode, map, 0);
- if (retval < 0) {
- up_read(&EXT4_I(inode)->i_data_sem);
- return retval;
- }
- if (retval > 0) {
- unsigned int status;
-
- if (unlikely(retval != map->m_len)) {
- ext4_warning(inode->i_sb,
- "ES len assertion failed for inode "
- "%lu: retval %d != map->m_len %d",
- inode->i_ino, retval, map->m_len);
- WARN_ON(1);
- }
-
- status = map->m_flags & EXT4_MAP_UNWRITTEN ?
- EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
- ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
- map->m_pblk, status);
- up_read(&EXT4_I(inode)->i_data_sem);
- return retval;
- }
+ retval = ext4_map_query_blocks(NULL, inode, map);
up_read(&EXT4_I(inode)->i_data_sem);
+ if (retval)
+ return retval;
add_delayed:
down_write(&EXT4_I(inode)->i_data_sem);
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v5 01/10] ext4: factor out a common helper to query extent map
2024-05-17 12:39 ` [PATCH v5 01/10] ext4: factor out a common helper to query extent map Zhang Yi
@ 2024-05-17 16:19 ` Markus Elfring
0 siblings, 0 replies; 20+ messages in thread
From: Markus Elfring @ 2024-05-17 16:19 UTC (permalink / raw)
To: Zhang Yi, linux-ext4, linux-fsdevel, kernel-janitors
Cc: LKML, Andreas Dilger, Jan Kara, Ritesh Harjani, Theodore Ts'o,
Yu Kuai, Zhang Yi, Zhihao Cheng
…
> ext4_da_map_blocks(), it query and return the extent map status on the
> inode's extent path, no logic changes.
Please improve this change description another bit.
Regards,
Markus
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v5 02/10] ext4: check the extent status again before inserting delalloc block
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
2024-05-17 12:39 ` [PATCH v5 01/10] ext4: factor out a common helper to query extent map Zhang Yi
@ 2024-05-17 12:39 ` Zhang Yi
2024-05-17 12:39 ` [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive Zhang Yi
` (8 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:39 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
ext4_da_map_blocks looks up for any extent entry in the extent status
tree (w/o i_data_sem) and then the looks up for any ondisk extent
mapping (with i_data_sem in read mode).
If it finds a hole in the extent status tree or if it couldn't find any
entry at all, it then takes the i_data_sem in write mode to add a da
entry into the extent status tree. This can actually race with page
mkwrite & fallocate path.
Note that this is ok between
1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the
folio lock
2. ext4 buffered write path v/s ext4 fallocate because of the inode
lock.
But this can race between ext4_page_mkwrite() & ext4 fallocate path
ext4_page_mkwrite() ext4_fallocate()
block_page_mkwrite()
ext4_da_map_blocks()
//find hole in extent status tree
ext4_alloc_file_blocks()
ext4_map_blocks()
//allocate block and unwritten extent
ext4_insert_delayed_block()
ext4_da_reserve_space()
//reserve one more block
ext4_es_insert_delayed_block()
//drop unwritten extent and add delayed extent by mistake
Then, the delalloc extent is wrong until writeback and the extra
reserved block can't be released any more and it triggers below warning:
EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!
Fix the problem by looking up extent status tree again while the
i_data_sem is held in write mode. If it still can't find any entry, then
we insert a new da entry into the extent status tree.
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6a41172c06e1..6114ca79f464 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1737,6 +1737,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
if (ext4_es_is_hole(&es))
goto add_delayed;
+found:
/*
* Delayed extent could be allocated by fallocate.
* So we need to check it.
@@ -1781,6 +1782,26 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
add_delayed:
down_write(&EXT4_I(inode)->i_data_sem);
+ /*
+ * Page fault path (ext4_page_mkwrite does not take i_rwsem)
+ * and fallocate path (no folio lock) can race. Make sure we
+ * lookup the extent status tree here again while i_data_sem
+ * is held in write mode, before inserting a new da entry in
+ * the extent status tree.
+ */
+ if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+ if (!ext4_es_is_hole(&es)) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto found;
+ }
+ } else if (!ext4_has_inline_data(inode)) {
+ retval = ext4_map_query_blocks(NULL, inode, map);
+ if (retval) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ return retval;
+ }
+ }
+
retval = ext4_insert_delayed_block(inode, map->m_lblk);
up_write(&EXT4_I(inode)->i_data_sem);
if (retval)
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
2024-05-17 12:39 ` [PATCH v5 01/10] ext4: factor out a common helper to query extent map Zhang Yi
2024-05-17 12:39 ` [PATCH v5 02/10] ext4: check the extent status again before inserting delalloc block Zhang Yi
@ 2024-05-17 12:39 ` Zhang Yi
2024-05-20 9:35 ` Jan Kara
2024-09-24 3:25 ` Lai, Yi
2024-05-17 12:39 ` [PATCH v5 04/10] ext4: trim delalloc extent Zhang Yi
` (7 subsequent siblings)
10 siblings, 2 replies; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:39 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
The per-inode i_reserved_data_blocks count the reserved delalloc blocks
in a regular file, it should be zero when destroying the file. The
per-fs s_dirtyclusters_counter count all reserved delalloc blocks in a
filesystem, it also should be zero when umounting the filesystem. Now we
have only an error message if the i_reserved_data_blocks is not zero,
which is unable to be simply captured, so add WARN_ON_ONCE to make it
more visable.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/super.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 044135796f2b..b68064c877e3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1343,6 +1343,9 @@ static void ext4_put_super(struct super_block *sb)
ext4_group_desc_free(sbi);
ext4_flex_groups_free(sbi);
+
+ WARN_ON_ONCE(!(sbi->s_mount_state & EXT4_ERROR_FS) &&
+ percpu_counter_sum(&sbi->s_dirtyclusters_counter));
ext4_percpu_param_destroy(sbi);
#ifdef CONFIG_QUOTA
for (int i = 0; i < EXT4_MAXQUOTAS; i++)
@@ -1473,7 +1476,8 @@ static void ext4_destroy_inode(struct inode *inode)
dump_stack();
}
- if (EXT4_I(inode)->i_reserved_data_blocks)
+ if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS) &&
+ WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks))
ext4_msg(inode->i_sb, KERN_ERR,
"Inode %lu (%p): i_reserved_data_blocks (%u) not cleared!",
inode->i_ino, EXT4_I(inode),
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive
2024-05-17 12:39 ` [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive Zhang Yi
@ 2024-05-20 9:35 ` Jan Kara
2024-09-24 3:25 ` Lai, Yi
1 sibling, 0 replies; 20+ messages in thread
From: Jan Kara @ 2024-05-20 9:35 UTC (permalink / raw)
To: Zhang Yi
Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3
On Fri 17-05-24 20:39:58, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
>
> The per-inode i_reserved_data_blocks count the reserved delalloc blocks
> in a regular file, it should be zero when destroying the file. The
> per-fs s_dirtyclusters_counter count all reserved delalloc blocks in a
> filesystem, it also should be zero when umounting the filesystem. Now we
> have only an error message if the i_reserved_data_blocks is not zero,
> which is unable to be simply captured, so add WARN_ON_ONCE to make it
> more visable.
>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/super.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 044135796f2b..b68064c877e3 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1343,6 +1343,9 @@ static void ext4_put_super(struct super_block *sb)
>
> ext4_group_desc_free(sbi);
> ext4_flex_groups_free(sbi);
> +
> + WARN_ON_ONCE(!(sbi->s_mount_state & EXT4_ERROR_FS) &&
> + percpu_counter_sum(&sbi->s_dirtyclusters_counter));
> ext4_percpu_param_destroy(sbi);
> #ifdef CONFIG_QUOTA
> for (int i = 0; i < EXT4_MAXQUOTAS; i++)
> @@ -1473,7 +1476,8 @@ static void ext4_destroy_inode(struct inode *inode)
> dump_stack();
> }
>
> - if (EXT4_I(inode)->i_reserved_data_blocks)
> + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS) &&
> + WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks))
> ext4_msg(inode->i_sb, KERN_ERR,
> "Inode %lu (%p): i_reserved_data_blocks (%u) not cleared!",
> inode->i_ino, EXT4_I(inode),
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive
2024-05-17 12:39 ` [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive Zhang Yi
2024-05-20 9:35 ` Jan Kara
@ 2024-09-24 3:25 ` Lai, Yi
2024-09-24 8:38 ` Zhang Yi
1 sibling, 1 reply; 20+ messages in thread
From: Lai, Yi @ 2024-09-24 3:25 UTC (permalink / raw)
To: Zhang Yi
Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3, yi1.lai
Hi Zhang Yi,
Greetings!
I used Syzkaller and found that there is WARNING in ext4_destroy_inode.
After bisection and the first bad commit is:
"
b37c907073e8 ext4: warn if delalloc counters are not zero on inactive
"
I understand that the commit is to add WARN_ON_ONCE to make error message more visible. I hope the reproduction program will be insightful for you.
All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/240923_043748_ext4_destroy_inode
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/main/240923_043748_ext4_destroy_inode/bzImage_98f7e32f20d28ec452afb208f9cffc08448a2652
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/98f7e32f20d28ec452afb208f9cffc08448a2652_dmesg.log
"
[ 25.223775] ------------[ cut here ]------------
[ 25.224177] WARNING: CPU: 0 PID: 740 at fs/ext4/super.c:1464 ext4_destroy_inode+0x1de/0x280
[ 25.224724] Modules linked in:
[ 25.224920] CPU: 0 UID: 0 PID: 740 Comm: repro Not tainted 6.11.0-98f7e32f20d2 #1
[ 25.225393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 25.226103] RIP: 0010:ext4_destroy_inode+0x1de/0x280
[ 25.226429] Code: 31 ff 44 89 e6 e8 62 ad 45 ff 45 85 e4 75 16 e8 d8 a9 45 ff 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 cc cc cc cc e8 c2 a9 45 ff <0f> 0b 48 8d 7b 40 4c 8d 83 50 fd ff ff 48 b8 00 00 00 00 00 fc ff
[ 25.227570] RSP: 0018:ff11000023707c08 EFLAGS: 00010293
[ 25.227915] RAX: 0000000000000000 RBX: ff11000022f22a50 RCX: ffffffff822028de
[ 25.228357] RDX: ff110000139a8000 RSI: ffffffff822028fe RDI: 0000000000000005
[ 25.228840] RBP: ff11000023707c30 R08: 0000000000000001 R09: ffe21c00024e24eb
[ 25.229284] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
[ 25.229712] R13: ff11000012712000 R14: ff11000022f22ad0 R15: ff1100006c1aa440
[ 25.230168] FS: 00007f1d418a7800(0000) GS:ff1100006c400000(0000) knlGS:0000000000000000
[ 25.230666] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 25.230818] EXT4-fs (sda): Inode 151593 (000000004419e1b8): i_reserved_data_blocks (1) not cleared!
[ 25.231037] CR2: 00007f1d416b1ac0 CR3: 00000000140e4004 CR4: 0000000000771ef0
[ 25.232104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 25.232546] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 25.233006] PKRU: 55555554
[ 25.233184] Call Trace:
[ 25.233348] <TASK>
[ 25.233489] ? show_regs+0xa8/0xc0
[ 25.233724] ? __warn+0xee/0x380
[ 25.233953] ? report_bug+0x25e/0x4b0
[ 25.234201] ? ext4_destroy_inode+0x1de/0x280
[ 25.234485] ? report_bug+0x2cb/0x4b0
[ 25.234729] ? ext4_destroy_inode+0x1de/0x280
[ 25.235020] ? handle_bug+0xa2/0x130
[ 25.235266] ? exc_invalid_op+0x3c/0x80
[ 25.235513] ? asm_exc_invalid_op+0x1f/0x30
[ 25.235786] ? ext4_destroy_inode+0x1be/0x280
[ 25.236072] ? ext4_destroy_inode+0x1de/0x280
[ 25.236356] ? ext4_destroy_inode+0x1de/0x280
[ 25.236637] ? ext4_destroy_inode+0x1de/0x280
[ 25.236949] ? __pfx_ext4_destroy_inode+0x10/0x10
[ 25.237257] destroy_inode+0xd6/0x1d0
[ 25.237507] evict+0x5a7/0x930
[ 25.237708] ? lock_release+0x441/0x870
[ 25.237975] ? do_raw_spin_lock+0x141/0x280
[ 25.238246] ? __pfx_evict+0x10/0x10
[ 25.238486] ? __pfx_lock_release+0x10/0x10
[ 25.238757] ? lock_release+0x441/0x870
[ 25.239015] ? lock_release+0x441/0x870
[ 25.239266] ? do_raw_spin_unlock+0x15c/0x210
[ 25.239552] iput.part.0+0x543/0x740
[ 25.239788] ? __pfx_ext4_drop_inode+0x10/0x10
[ 25.240081] iput+0x68/0x90
[ 25.240265] do_unlinkat+0x5dc/0x730
[ 25.240503] ? __pfx_do_unlinkat+0x10/0x10
[ 25.240791] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 25.241149] ? strncpy_from_user+0x1ef/0x2e0
[ 25.241436] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 25.241774] ? getname_flags.part.0+0x1d5/0x570
[ 25.242459] __x64_sys_unlink+0xd1/0x120
[ 25.242749] x64_sys_call+0x2014/0x20d0
[ 25.243031] do_syscall_64+0x6d/0x140
[ 25.243304] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 25.243630] RIP: 0033:0x7f1d4163eb7b
[ 25.243878] Code: f0 ff ff 73 01 c3 48 8b 0d a2 b2 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 b2 1b 00 f7 d8 64 89 01 48
[ 25.245038] RSP: 002b:00007fffffa2ca48 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
[ 25.245508] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1d4163eb7b
[ 25.245966] RDX: 00007fffffa2ca60 RSI: 00007fffffa2caf0 RDI: 00007fffffa2caf0
[ 25.246412] RBP: 00007fffffa2db30 R08: 0000000000000000 R09: 00007fffffa2c8e0
[ 25.246872] R10: 00007f1d4160b208 R11: 0000000000000206 R12: 00007fffffa2dca8
[ 25.247310] R13: 0000000000402e4b R14: 0000000000404e08 R15: 00007f1d418f2000
[ 25.247759] </TASK>
[ 25.247912] irq event stamp: 5719
[ 25.248127] hardirqs last enabled at (5727): [<ffffffff81458eb4>] console_unlock+0x224/0x240
[ 25.248690] hardirqs last disabled at (5736): [<ffffffff81458e99>] console_unlock+0x209/0x240
[ 25.249236] softirqs last enabled at (5252): [<ffffffff81289d19>] __irq_exit_rcu+0xa9/0x120
[ 25.249768] softirqs last disabled at (5247): [<ffffffff81289d19>] __irq_exit_rcu+0xa9/0x120
[ 25.250311] ---[ end trace 0000000000000000 ]---
[ 25.250602] EXT4-fs (sda): Inode 151586 (00000000f9d6a315): i_reserved_data_blocks (1) not cleared!
[ 25.326263] EXT4-fs (sda): Inode 151578 (00000000d86ad2f9): i_reserved_data_blocks (1) not cleared!
[ 25.680884] EXT4-fs (sda): Inode 151596 (00000000da9177c9): i_reserved_data_blocks (1) not cleared!
[ 25.717550] EXT4-fs (sda): Inode 151573 (0000000088687caa): i_reserved_data_blocks (1) not cleared!
[ 25.726089] EXT4-fs (sda): Inode 151585 (000000005d7aed9a): i_reserved_data_blocks (1) not cleared!
[ 25.838592] EXT4-fs (sda): Inode 151573 (000000004af622df): i_reserved_data_blocks (1) not cleared!
[ 25.955073] EXT4-fs (sda): Inode 151598 (00000000a6e598ec): i_reserved_data_blocks (1) not cleared!
[ 26.525552] EXT4-fs (sda): Inode 151593 (0000000026aef1cd): i_reserved_data_blocks (1) not cleared!
[ 26.554067] EXT4-fs (sda): Inode 151591 (0000000051e990da): i_reserved_data_blocks (1) not cleared!
[ 30.291490] EXT4-fs: 14 callbacks suppressed
[ 30.291510] EXT4-fs (sda): Inode 151591 (0000000050be254a): i_reserved_data_blocks (1) not cleared!
[ 30.301238] EXT4-fs (sda): Inode 151587 (000000004ba9ad70): i_reserved_data_blocks (1) not cleared!
[ 30.414377] EXT4-fs (sda): Inode 151583 (00000000f6751ad3): i_reserved_data_blocks (1) not cleared!
[ 30.417213] EXT4-fs (sda): Inode 151591 (0000000090a0dce3): i_reserved_data_blocks (1) not cleared!
[ 30.537920] EXT4-fs (sda): Inode 151587 (00000000de72acf9): i_reserved_data_blocks (1) not cleared!
[ 30.645791] EXT4-fs (sda): Inode 151580 (00000000a40a052f): i_reserved_data_blocks (1) not cleared!
[ 30.665732] EXT4-fs (sda): Inode 151587 (00000000d9452edd): i_reserved_data_blocks (1) not cleared!
[ 30.670204] EXT4-fs (sda): Inode 151597 (00000000f861d75f): i_reserved_data_blocks (1) not cleared!
[ 31.964931] EXT4-fs (sda): Inode 151589 (000000009baa4064): i_reserved_data_blocks (1) not cleared!
[ 32.101343] EXT4-fs (sda): Inode 151598 (000000003fca6cd5): i_reserved_data_blocks (1) not cleared!
"
I hope you find it useful.
Regards,
Yi Lai
---
If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.
How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
// Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost
After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/
Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has
Fill the bzImage file into above start3.sh to load the target kernel in vm.
Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install
On Fri, May 17, 2024 at 08:39:58PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
>
> The per-inode i_reserved_data_blocks count the reserved delalloc blocks
> in a regular file, it should be zero when destroying the file. The
> per-fs s_dirtyclusters_counter count all reserved delalloc blocks in a
> filesystem, it also should be zero when umounting the filesystem. Now we
> have only an error message if the i_reserved_data_blocks is not zero,
> which is unable to be simply captured, so add WARN_ON_ONCE to make it
> more visable.
>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
> fs/ext4/super.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 044135796f2b..b68064c877e3 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1343,6 +1343,9 @@ static void ext4_put_super(struct super_block *sb)
>
> ext4_group_desc_free(sbi);
> ext4_flex_groups_free(sbi);
> +
> + WARN_ON_ONCE(!(sbi->s_mount_state & EXT4_ERROR_FS) &&
> + percpu_counter_sum(&sbi->s_dirtyclusters_counter));
> ext4_percpu_param_destroy(sbi);
> #ifdef CONFIG_QUOTA
> for (int i = 0; i < EXT4_MAXQUOTAS; i++)
> @@ -1473,7 +1476,8 @@ static void ext4_destroy_inode(struct inode *inode)
> dump_stack();
> }
>
> - if (EXT4_I(inode)->i_reserved_data_blocks)
> + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS) &&
> + WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks))
> ext4_msg(inode->i_sb, KERN_ERR,
> "Inode %lu (%p): i_reserved_data_blocks (%u) not cleared!",
> inode->i_ino, EXT4_I(inode),
> --
> 2.39.2
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive
2024-09-24 3:25 ` Lai, Yi
@ 2024-09-24 8:38 ` Zhang Yi
2024-09-25 9:52 ` Lai, Yi
0 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2024-09-24 8:38 UTC (permalink / raw)
To: Lai, Yi
Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3, yi1.lai
On 2024/9/24 11:25, Lai, Yi wrote:
> Hi Zhang Yi,
>
> Greetings!
>
> I used Syzkaller and found that there is WARNING in ext4_destroy_inode.
>
> After bisection and the first bad commit is:
> "
> b37c907073e8 ext4: warn if delalloc counters are not zero on inactive
> "
>
> I understand that the commit is to add WARN_ON_ONCE to make error message more visible. I hope the reproduction program will be insightful for you.
>
Thanks for the report! It seems that this patch worked, it start exposing
problems about inconsistent delalloc counter, which were previously hidden.
However, the counter updating logic has changed after this series:
https://lore.kernel.org/linux-ext4/20240813123452.2824659-1-yi.zhang@huaweicloud.com/
Could you reproduce this issue with this series or in the latest upstream
kernel?
Thanks,
Yi.
> All detailed into can be found at:
> https://github.com/laifryiee/syzkaller_logs/tree/main/240923_043748_ext4_destroy_inode
> Syzkaller repro code:
> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.c
> Syzkaller repro syscall steps:
> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.prog
> Syzkaller report:
> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.report
> Kconfig(make olddefconfig):
> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/kconfig_origin
> Bisect info:
> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/bisect_info.log
> bzImage:
> https://github.com/laifryiee/syzkaller_logs/raw/main/240923_043748_ext4_destroy_inode/bzImage_98f7e32f20d28ec452afb208f9cffc08448a2652
> Issue dmesg:
> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/98f7e32f20d28ec452afb208f9cffc08448a2652_dmesg.log
>
> "
> [ 25.223775] ------------[ cut here ]------------
> [ 25.224177] WARNING: CPU: 0 PID: 740 at fs/ext4/super.c:1464 ext4_destroy_inode+0x1de/0x280
> [ 25.224724] Modules linked in:
> [ 25.224920] CPU: 0 UID: 0 PID: 740 Comm: repro Not tainted 6.11.0-98f7e32f20d2 #1
> [ 25.225393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [ 25.226103] RIP: 0010:ext4_destroy_inode+0x1de/0x280
> [ 25.226429] Code: 31 ff 44 89 e6 e8 62 ad 45 ff 45 85 e4 75 16 e8 d8 a9 45 ff 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 cc cc cc cc e8 c2 a9 45 ff <0f> 0b 48 8d 7b 40 4c 8d 83 50 fd ff ff 48 b8 00 00 00 00 00 fc ff
> [ 25.227570] RSP: 0018:ff11000023707c08 EFLAGS: 00010293
> [ 25.227915] RAX: 0000000000000000 RBX: ff11000022f22a50 RCX: ffffffff822028de
> [ 25.228357] RDX: ff110000139a8000 RSI: ffffffff822028fe RDI: 0000000000000005
> [ 25.228840] RBP: ff11000023707c30 R08: 0000000000000001 R09: ffe21c00024e24eb
> [ 25.229284] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
> [ 25.229712] R13: ff11000012712000 R14: ff11000022f22ad0 R15: ff1100006c1aa440
> [ 25.230168] FS: 00007f1d418a7800(0000) GS:ff1100006c400000(0000) knlGS:0000000000000000
> [ 25.230666] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 25.230818] EXT4-fs (sda): Inode 151593 (000000004419e1b8): i_reserved_data_blocks (1) not cleared!
> [ 25.231037] CR2: 00007f1d416b1ac0 CR3: 00000000140e4004 CR4: 0000000000771ef0
> [ 25.232104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 25.232546] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [ 25.233006] PKRU: 55555554
> [ 25.233184] Call Trace:
> [ 25.233348] <TASK>
> [ 25.233489] ? show_regs+0xa8/0xc0
> [ 25.233724] ? __warn+0xee/0x380
> [ 25.233953] ? report_bug+0x25e/0x4b0
> [ 25.234201] ? ext4_destroy_inode+0x1de/0x280
> [ 25.234485] ? report_bug+0x2cb/0x4b0
> [ 25.234729] ? ext4_destroy_inode+0x1de/0x280
> [ 25.235020] ? handle_bug+0xa2/0x130
> [ 25.235266] ? exc_invalid_op+0x3c/0x80
> [ 25.235513] ? asm_exc_invalid_op+0x1f/0x30
> [ 25.235786] ? ext4_destroy_inode+0x1be/0x280
> [ 25.236072] ? ext4_destroy_inode+0x1de/0x280
> [ 25.236356] ? ext4_destroy_inode+0x1de/0x280
> [ 25.236637] ? ext4_destroy_inode+0x1de/0x280
> [ 25.236949] ? __pfx_ext4_destroy_inode+0x10/0x10
> [ 25.237257] destroy_inode+0xd6/0x1d0
> [ 25.237507] evict+0x5a7/0x930
> [ 25.237708] ? lock_release+0x441/0x870
> [ 25.237975] ? do_raw_spin_lock+0x141/0x280
> [ 25.238246] ? __pfx_evict+0x10/0x10
> [ 25.238486] ? __pfx_lock_release+0x10/0x10
> [ 25.238757] ? lock_release+0x441/0x870
> [ 25.239015] ? lock_release+0x441/0x870
> [ 25.239266] ? do_raw_spin_unlock+0x15c/0x210
> [ 25.239552] iput.part.0+0x543/0x740
> [ 25.239788] ? __pfx_ext4_drop_inode+0x10/0x10
> [ 25.240081] iput+0x68/0x90
> [ 25.240265] do_unlinkat+0x5dc/0x730
> [ 25.240503] ? __pfx_do_unlinkat+0x10/0x10
> [ 25.240791] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [ 25.241149] ? strncpy_from_user+0x1ef/0x2e0
> [ 25.241436] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [ 25.241774] ? getname_flags.part.0+0x1d5/0x570
> [ 25.242459] __x64_sys_unlink+0xd1/0x120
> [ 25.242749] x64_sys_call+0x2014/0x20d0
> [ 25.243031] do_syscall_64+0x6d/0x140
> [ 25.243304] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 25.243630] RIP: 0033:0x7f1d4163eb7b
> [ 25.243878] Code: f0 ff ff 73 01 c3 48 8b 0d a2 b2 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 b2 1b 00 f7 d8 64 89 01 48
> [ 25.245038] RSP: 002b:00007fffffa2ca48 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
> [ 25.245508] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1d4163eb7b
> [ 25.245966] RDX: 00007fffffa2ca60 RSI: 00007fffffa2caf0 RDI: 00007fffffa2caf0
> [ 25.246412] RBP: 00007fffffa2db30 R08: 0000000000000000 R09: 00007fffffa2c8e0
> [ 25.246872] R10: 00007f1d4160b208 R11: 0000000000000206 R12: 00007fffffa2dca8
> [ 25.247310] R13: 0000000000402e4b R14: 0000000000404e08 R15: 00007f1d418f2000
> [ 25.247759] </TASK>
> [ 25.247912] irq event stamp: 5719
> [ 25.248127] hardirqs last enabled at (5727): [<ffffffff81458eb4>] console_unlock+0x224/0x240
> [ 25.248690] hardirqs last disabled at (5736): [<ffffffff81458e99>] console_unlock+0x209/0x240
> [ 25.249236] softirqs last enabled at (5252): [<ffffffff81289d19>] __irq_exit_rcu+0xa9/0x120
> [ 25.249768] softirqs last disabled at (5247): [<ffffffff81289d19>] __irq_exit_rcu+0xa9/0x120
> [ 25.250311] ---[ end trace 0000000000000000 ]---
> [ 25.250602] EXT4-fs (sda): Inode 151586 (00000000f9d6a315): i_reserved_data_blocks (1) not cleared!
> [ 25.326263] EXT4-fs (sda): Inode 151578 (00000000d86ad2f9): i_reserved_data_blocks (1) not cleared!
> [ 25.680884] EXT4-fs (sda): Inode 151596 (00000000da9177c9): i_reserved_data_blocks (1) not cleared!
> [ 25.717550] EXT4-fs (sda): Inode 151573 (0000000088687caa): i_reserved_data_blocks (1) not cleared!
> [ 25.726089] EXT4-fs (sda): Inode 151585 (000000005d7aed9a): i_reserved_data_blocks (1) not cleared!
> [ 25.838592] EXT4-fs (sda): Inode 151573 (000000004af622df): i_reserved_data_blocks (1) not cleared!
> [ 25.955073] EXT4-fs (sda): Inode 151598 (00000000a6e598ec): i_reserved_data_blocks (1) not cleared!
> [ 26.525552] EXT4-fs (sda): Inode 151593 (0000000026aef1cd): i_reserved_data_blocks (1) not cleared!
> [ 26.554067] EXT4-fs (sda): Inode 151591 (0000000051e990da): i_reserved_data_blocks (1) not cleared!
> [ 30.291490] EXT4-fs: 14 callbacks suppressed
> [ 30.291510] EXT4-fs (sda): Inode 151591 (0000000050be254a): i_reserved_data_blocks (1) not cleared!
> [ 30.301238] EXT4-fs (sda): Inode 151587 (000000004ba9ad70): i_reserved_data_blocks (1) not cleared!
> [ 30.414377] EXT4-fs (sda): Inode 151583 (00000000f6751ad3): i_reserved_data_blocks (1) not cleared!
> [ 30.417213] EXT4-fs (sda): Inode 151591 (0000000090a0dce3): i_reserved_data_blocks (1) not cleared!
> [ 30.537920] EXT4-fs (sda): Inode 151587 (00000000de72acf9): i_reserved_data_blocks (1) not cleared!
> [ 30.645791] EXT4-fs (sda): Inode 151580 (00000000a40a052f): i_reserved_data_blocks (1) not cleared!
> [ 30.665732] EXT4-fs (sda): Inode 151587 (00000000d9452edd): i_reserved_data_blocks (1) not cleared!
> [ 30.670204] EXT4-fs (sda): Inode 151597 (00000000f861d75f): i_reserved_data_blocks (1) not cleared!
> [ 31.964931] EXT4-fs (sda): Inode 151589 (000000009baa4064): i_reserved_data_blocks (1) not cleared!
> [ 32.101343] EXT4-fs (sda): Inode 151598 (000000003fca6cd5): i_reserved_data_blocks (1) not cleared!
> "
>
> I hope you find it useful.
>
> Regards,
> Yi Lai
>
> ---
>
> If you don't need the following environment to reproduce the problem or if you
> already have one reproduced environment, please ignore the following information.
>
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
> // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
> // You could change the bzImage_xxx as you want
> // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
>
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
>
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage //x should equal or less than cpu num your pc has
>
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> yum -y install libslirp-devel.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
> make
> make install
>
> On Fri, May 17, 2024 at 08:39:58PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> The per-inode i_reserved_data_blocks count the reserved delalloc blocks
>> in a regular file, it should be zero when destroying the file. The
>> per-fs s_dirtyclusters_counter count all reserved delalloc blocks in a
>> filesystem, it also should be zero when umounting the filesystem. Now we
>> have only an error message if the i_reserved_data_blocks is not zero,
>> which is unable to be simply captured, so add WARN_ON_ONCE to make it
>> more visable.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>> fs/ext4/super.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 044135796f2b..b68064c877e3 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -1343,6 +1343,9 @@ static void ext4_put_super(struct super_block *sb)
>>
>> ext4_group_desc_free(sbi);
>> ext4_flex_groups_free(sbi);
>> +
>> + WARN_ON_ONCE(!(sbi->s_mount_state & EXT4_ERROR_FS) &&
>> + percpu_counter_sum(&sbi->s_dirtyclusters_counter));
>> ext4_percpu_param_destroy(sbi);
>> #ifdef CONFIG_QUOTA
>> for (int i = 0; i < EXT4_MAXQUOTAS; i++)
>> @@ -1473,7 +1476,8 @@ static void ext4_destroy_inode(struct inode *inode)
>> dump_stack();
>> }
>>
>> - if (EXT4_I(inode)->i_reserved_data_blocks)
>> + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS) &&
>> + WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks))
>> ext4_msg(inode->i_sb, KERN_ERR,
>> "Inode %lu (%p): i_reserved_data_blocks (%u) not cleared!",
>> inode->i_ino, EXT4_I(inode),
>> --
>> 2.39.2
>>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive
2024-09-24 8:38 ` Zhang Yi
@ 2024-09-25 9:52 ` Lai, Yi
2024-09-25 11:34 ` Zhang Yi
0 siblings, 1 reply; 20+ messages in thread
From: Lai, Yi @ 2024-09-25 9:52 UTC (permalink / raw)
To: Zhang Yi
Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3, yi1.lai
Hi,
I have applied your mentioned patch series on top of linux tag v6.11.
Using the same repro binary, issue cannot be reproduced.
Regards,
Yi Lai
On Tue, Sep 24, 2024 at 04:38:22PM +0800, Zhang Yi wrote:
> On 2024/9/24 11:25, Lai, Yi wrote:
> > Hi Zhang Yi,
> >
> > Greetings!
> >
> > I used Syzkaller and found that there is WARNING in ext4_destroy_inode.
> >
> > After bisection and the first bad commit is:
> > "
> > b37c907073e8 ext4: warn if delalloc counters are not zero on inactive
> > "
> >
> > I understand that the commit is to add WARN_ON_ONCE to make error message more visible. I hope the reproduction program will be insightful for you.
> >
>
> Thanks for the report! It seems that this patch worked, it start exposing
> problems about inconsistent delalloc counter, which were previously hidden.
> However, the counter updating logic has changed after this series:
>
> https://lore.kernel.org/linux-ext4/20240813123452.2824659-1-yi.zhang@huaweicloud.com/
>
> Could you reproduce this issue with this series or in the latest upstream
> kernel?
>
> Thanks,
> Yi.
>
> > All detailed into can be found at:
> > https://github.com/laifryiee/syzkaller_logs/tree/main/240923_043748_ext4_destroy_inode
> > Syzkaller repro code:
> > https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.c
> > Syzkaller repro syscall steps:
> > https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.prog
> > Syzkaller report:
> > https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.report
> > Kconfig(make olddefconfig):
> > https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/kconfig_origin
> > Bisect info:
> > https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/bisect_info.log
> > bzImage:
> > https://github.com/laifryiee/syzkaller_logs/raw/main/240923_043748_ext4_destroy_inode/bzImage_98f7e32f20d28ec452afb208f9cffc08448a2652
> > Issue dmesg:
> > https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/98f7e32f20d28ec452afb208f9cffc08448a2652_dmesg.log
> >
> > "
> > [ 25.223775] ------------[ cut here ]------------
> > [ 25.224177] WARNING: CPU: 0 PID: 740 at fs/ext4/super.c:1464 ext4_destroy_inode+0x1de/0x280
> > [ 25.224724] Modules linked in:
> > [ 25.224920] CPU: 0 UID: 0 PID: 740 Comm: repro Not tainted 6.11.0-98f7e32f20d2 #1
> > [ 25.225393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [ 25.226103] RIP: 0010:ext4_destroy_inode+0x1de/0x280
> > [ 25.226429] Code: 31 ff 44 89 e6 e8 62 ad 45 ff 45 85 e4 75 16 e8 d8 a9 45 ff 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 cc cc cc cc e8 c2 a9 45 ff <0f> 0b 48 8d 7b 40 4c 8d 83 50 fd ff ff 48 b8 00 00 00 00 00 fc ff
> > [ 25.227570] RSP: 0018:ff11000023707c08 EFLAGS: 00010293
> > [ 25.227915] RAX: 0000000000000000 RBX: ff11000022f22a50 RCX: ffffffff822028de
> > [ 25.228357] RDX: ff110000139a8000 RSI: ffffffff822028fe RDI: 0000000000000005
> > [ 25.228840] RBP: ff11000023707c30 R08: 0000000000000001 R09: ffe21c00024e24eb
> > [ 25.229284] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
> > [ 25.229712] R13: ff11000012712000 R14: ff11000022f22ad0 R15: ff1100006c1aa440
> > [ 25.230168] FS: 00007f1d418a7800(0000) GS:ff1100006c400000(0000) knlGS:0000000000000000
> > [ 25.230666] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 25.230818] EXT4-fs (sda): Inode 151593 (000000004419e1b8): i_reserved_data_blocks (1) not cleared!
> > [ 25.231037] CR2: 00007f1d416b1ac0 CR3: 00000000140e4004 CR4: 0000000000771ef0
> > [ 25.232104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 25.232546] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > [ 25.233006] PKRU: 55555554
> > [ 25.233184] Call Trace:
> > [ 25.233348] <TASK>
> > [ 25.233489] ? show_regs+0xa8/0xc0
> > [ 25.233724] ? __warn+0xee/0x380
> > [ 25.233953] ? report_bug+0x25e/0x4b0
> > [ 25.234201] ? ext4_destroy_inode+0x1de/0x280
> > [ 25.234485] ? report_bug+0x2cb/0x4b0
> > [ 25.234729] ? ext4_destroy_inode+0x1de/0x280
> > [ 25.235020] ? handle_bug+0xa2/0x130
> > [ 25.235266] ? exc_invalid_op+0x3c/0x80
> > [ 25.235513] ? asm_exc_invalid_op+0x1f/0x30
> > [ 25.235786] ? ext4_destroy_inode+0x1be/0x280
> > [ 25.236072] ? ext4_destroy_inode+0x1de/0x280
> > [ 25.236356] ? ext4_destroy_inode+0x1de/0x280
> > [ 25.236637] ? ext4_destroy_inode+0x1de/0x280
> > [ 25.236949] ? __pfx_ext4_destroy_inode+0x10/0x10
> > [ 25.237257] destroy_inode+0xd6/0x1d0
> > [ 25.237507] evict+0x5a7/0x930
> > [ 25.237708] ? lock_release+0x441/0x870
> > [ 25.237975] ? do_raw_spin_lock+0x141/0x280
> > [ 25.238246] ? __pfx_evict+0x10/0x10
> > [ 25.238486] ? __pfx_lock_release+0x10/0x10
> > [ 25.238757] ? lock_release+0x441/0x870
> > [ 25.239015] ? lock_release+0x441/0x870
> > [ 25.239266] ? do_raw_spin_unlock+0x15c/0x210
> > [ 25.239552] iput.part.0+0x543/0x740
> > [ 25.239788] ? __pfx_ext4_drop_inode+0x10/0x10
> > [ 25.240081] iput+0x68/0x90
> > [ 25.240265] do_unlinkat+0x5dc/0x730
> > [ 25.240503] ? __pfx_do_unlinkat+0x10/0x10
> > [ 25.240791] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> > [ 25.241149] ? strncpy_from_user+0x1ef/0x2e0
> > [ 25.241436] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> > [ 25.241774] ? getname_flags.part.0+0x1d5/0x570
> > [ 25.242459] __x64_sys_unlink+0xd1/0x120
> > [ 25.242749] x64_sys_call+0x2014/0x20d0
> > [ 25.243031] do_syscall_64+0x6d/0x140
> > [ 25.243304] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 25.243630] RIP: 0033:0x7f1d4163eb7b
> > [ 25.243878] Code: f0 ff ff 73 01 c3 48 8b 0d a2 b2 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 b2 1b 00 f7 d8 64 89 01 48
> > [ 25.245038] RSP: 002b:00007fffffa2ca48 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
> > [ 25.245508] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1d4163eb7b
> > [ 25.245966] RDX: 00007fffffa2ca60 RSI: 00007fffffa2caf0 RDI: 00007fffffa2caf0
> > [ 25.246412] RBP: 00007fffffa2db30 R08: 0000000000000000 R09: 00007fffffa2c8e0
> > [ 25.246872] R10: 00007f1d4160b208 R11: 0000000000000206 R12: 00007fffffa2dca8
> > [ 25.247310] R13: 0000000000402e4b R14: 0000000000404e08 R15: 00007f1d418f2000
> > [ 25.247759] </TASK>
> > [ 25.247912] irq event stamp: 5719
> > [ 25.248127] hardirqs last enabled at (5727): [<ffffffff81458eb4>] console_unlock+0x224/0x240
> > [ 25.248690] hardirqs last disabled at (5736): [<ffffffff81458e99>] console_unlock+0x209/0x240
> > [ 25.249236] softirqs last enabled at (5252): [<ffffffff81289d19>] __irq_exit_rcu+0xa9/0x120
> > [ 25.249768] softirqs last disabled at (5247): [<ffffffff81289d19>] __irq_exit_rcu+0xa9/0x120
> > [ 25.250311] ---[ end trace 0000000000000000 ]---
> > [ 25.250602] EXT4-fs (sda): Inode 151586 (00000000f9d6a315): i_reserved_data_blocks (1) not cleared!
> > [ 25.326263] EXT4-fs (sda): Inode 151578 (00000000d86ad2f9): i_reserved_data_blocks (1) not cleared!
> > [ 25.680884] EXT4-fs (sda): Inode 151596 (00000000da9177c9): i_reserved_data_blocks (1) not cleared!
> > [ 25.717550] EXT4-fs (sda): Inode 151573 (0000000088687caa): i_reserved_data_blocks (1) not cleared!
> > [ 25.726089] EXT4-fs (sda): Inode 151585 (000000005d7aed9a): i_reserved_data_blocks (1) not cleared!
> > [ 25.838592] EXT4-fs (sda): Inode 151573 (000000004af622df): i_reserved_data_blocks (1) not cleared!
> > [ 25.955073] EXT4-fs (sda): Inode 151598 (00000000a6e598ec): i_reserved_data_blocks (1) not cleared!
> > [ 26.525552] EXT4-fs (sda): Inode 151593 (0000000026aef1cd): i_reserved_data_blocks (1) not cleared!
> > [ 26.554067] EXT4-fs (sda): Inode 151591 (0000000051e990da): i_reserved_data_blocks (1) not cleared!
> > [ 30.291490] EXT4-fs: 14 callbacks suppressed
> > [ 30.291510] EXT4-fs (sda): Inode 151591 (0000000050be254a): i_reserved_data_blocks (1) not cleared!
> > [ 30.301238] EXT4-fs (sda): Inode 151587 (000000004ba9ad70): i_reserved_data_blocks (1) not cleared!
> > [ 30.414377] EXT4-fs (sda): Inode 151583 (00000000f6751ad3): i_reserved_data_blocks (1) not cleared!
> > [ 30.417213] EXT4-fs (sda): Inode 151591 (0000000090a0dce3): i_reserved_data_blocks (1) not cleared!
> > [ 30.537920] EXT4-fs (sda): Inode 151587 (00000000de72acf9): i_reserved_data_blocks (1) not cleared!
> > [ 30.645791] EXT4-fs (sda): Inode 151580 (00000000a40a052f): i_reserved_data_blocks (1) not cleared!
> > [ 30.665732] EXT4-fs (sda): Inode 151587 (00000000d9452edd): i_reserved_data_blocks (1) not cleared!
> > [ 30.670204] EXT4-fs (sda): Inode 151597 (00000000f861d75f): i_reserved_data_blocks (1) not cleared!
> > [ 31.964931] EXT4-fs (sda): Inode 151589 (000000009baa4064): i_reserved_data_blocks (1) not cleared!
> > [ 32.101343] EXT4-fs (sda): Inode 151598 (000000003fca6cd5): i_reserved_data_blocks (1) not cleared!
> > "
> >
> > I hope you find it useful.
> >
> > Regards,
> > Yi Lai
> >
> > ---
> >
> > If you don't need the following environment to reproduce the problem or if you
> > already have one reproduced environment, please ignore the following information.
> >
> > How to reproduce:
> > git clone https://gitlab.com/xupengfe/repro_vm_env.git
> > cd repro_vm_env
> > tar -xvf repro_vm_env.tar.gz
> > cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
> > // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
> > // You could change the bzImage_xxx as you want
> > // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
> > You could use below command to log in, there is no password for root.
> > ssh -p 10023 root@localhost
> >
> > After login vm(virtual machine) successfully, you could transfer reproduced
> > binary to the vm by below way, and reproduce the problem in vm:
> > gcc -pthread -o repro repro.c
> > scp -P 10023 repro root@localhost:/root/
> >
> > Get the bzImage for target kernel:
> > Please use target kconfig and copy it to kernel_src/.config
> > make olddefconfig
> > make -jx bzImage //x should equal or less than cpu num your pc has
> >
> > Fill the bzImage file into above start3.sh to load the target kernel in vm.
> >
> > Tips:
> > If you already have qemu-system-x86_64, please ignore below info.
> > If you want to install qemu v7.1.0 version:
> > git clone https://github.com/qemu/qemu.git
> > cd qemu
> > git checkout -f v7.1.0
> > mkdir build
> > cd build
> > yum install -y ninja-build.x86_64
> > yum -y install libslirp-devel.x86_64
> > ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
> > make
> > make install
> >
> > On Fri, May 17, 2024 at 08:39:58PM +0800, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> The per-inode i_reserved_data_blocks count the reserved delalloc blocks
> >> in a regular file, it should be zero when destroying the file. The
> >> per-fs s_dirtyclusters_counter count all reserved delalloc blocks in a
> >> filesystem, it also should be zero when umounting the filesystem. Now we
> >> have only an error message if the i_reserved_data_blocks is not zero,
> >> which is unable to be simply captured, so add WARN_ON_ONCE to make it
> >> more visable.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >> ---
> >> fs/ext4/super.c | 6 +++++-
> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> >> index 044135796f2b..b68064c877e3 100644
> >> --- a/fs/ext4/super.c
> >> +++ b/fs/ext4/super.c
> >> @@ -1343,6 +1343,9 @@ static void ext4_put_super(struct super_block *sb)
> >>
> >> ext4_group_desc_free(sbi);
> >> ext4_flex_groups_free(sbi);
> >> +
> >> + WARN_ON_ONCE(!(sbi->s_mount_state & EXT4_ERROR_FS) &&
> >> + percpu_counter_sum(&sbi->s_dirtyclusters_counter));
> >> ext4_percpu_param_destroy(sbi);
> >> #ifdef CONFIG_QUOTA
> >> for (int i = 0; i < EXT4_MAXQUOTAS; i++)
> >> @@ -1473,7 +1476,8 @@ static void ext4_destroy_inode(struct inode *inode)
> >> dump_stack();
> >> }
> >>
> >> - if (EXT4_I(inode)->i_reserved_data_blocks)
> >> + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS) &&
> >> + WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks))
> >> ext4_msg(inode->i_sb, KERN_ERR,
> >> "Inode %lu (%p): i_reserved_data_blocks (%u) not cleared!",
> >> inode->i_ino, EXT4_I(inode),
> >> --
> >> 2.39.2
> >>
> >
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive
2024-09-25 9:52 ` Lai, Yi
@ 2024-09-25 11:34 ` Zhang Yi
0 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2024-09-25 11:34 UTC (permalink / raw)
To: Lai, Yi
Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3, yi1.lai
On 2024/9/25 17:52, Lai, Yi wrote:
> Hi,
>
> I have applied your mentioned patch series on top of linux tag v6.11.
> Using the same repro binary, issue cannot be reproduced.
>
Ah, that's great, this series seems had fixed some corner problems
as well, thanks a lot for testing this.
Thanks,
Yi.
> Regards,
> Yi Lai
>
> On Tue, Sep 24, 2024 at 04:38:22PM +0800, Zhang Yi wrote:
>> On 2024/9/24 11:25, Lai, Yi wrote:
>>> Hi Zhang Yi,
>>>
>>> Greetings!
>>>
>>> I used Syzkaller and found that there is WARNING in ext4_destroy_inode.
>>>
>>> After bisection and the first bad commit is:
>>> "
>>> b37c907073e8 ext4: warn if delalloc counters are not zero on inactive
>>> "
>>>
>>> I understand that the commit is to add WARN_ON_ONCE to make error message more visible. I hope the reproduction program will be insightful for you.
>>>
>>
>> Thanks for the report! It seems that this patch worked, it start exposing
>> problems about inconsistent delalloc counter, which were previously hidden.
>> However, the counter updating logic has changed after this series:
>>
>> https://lore.kernel.org/linux-ext4/20240813123452.2824659-1-yi.zhang@huaweicloud.com/
>>
>> Could you reproduce this issue with this series or in the latest upstream
>> kernel?
>>
>> Thanks,
>> Yi.
>>
>>> All detailed into can be found at:
>>> https://github.com/laifryiee/syzkaller_logs/tree/main/240923_043748_ext4_destroy_inode
>>> Syzkaller repro code:
>>> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.c
>>> Syzkaller repro syscall steps:
>>> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.prog
>>> Syzkaller report:
>>> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/repro.report
>>> Kconfig(make olddefconfig):
>>> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/kconfig_origin
>>> Bisect info:
>>> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/bisect_info.log
>>> bzImage:
>>> https://github.com/laifryiee/syzkaller_logs/raw/main/240923_043748_ext4_destroy_inode/bzImage_98f7e32f20d28ec452afb208f9cffc08448a2652
>>> Issue dmesg:
>>> https://github.com/laifryiee/syzkaller_logs/blob/main/240923_043748_ext4_destroy_inode/98f7e32f20d28ec452afb208f9cffc08448a2652_dmesg.log
>>>
>>> "
>>> [ 25.223775] ------------[ cut here ]------------
>>> [ 25.224177] WARNING: CPU: 0 PID: 740 at fs/ext4/super.c:1464 ext4_destroy_inode+0x1de/0x280
>>> [ 25.224724] Modules linked in:
>>> [ 25.224920] CPU: 0 UID: 0 PID: 740 Comm: repro Not tainted 6.11.0-98f7e32f20d2 #1
>>> [ 25.225393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>>> [ 25.226103] RIP: 0010:ext4_destroy_inode+0x1de/0x280
>>> [ 25.226429] Code: 31 ff 44 89 e6 e8 62 ad 45 ff 45 85 e4 75 16 e8 d8 a9 45 ff 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 cc cc cc cc e8 c2 a9 45 ff <0f> 0b 48 8d 7b 40 4c 8d 83 50 fd ff ff 48 b8 00 00 00 00 00 fc ff
>>> [ 25.227570] RSP: 0018:ff11000023707c08 EFLAGS: 00010293
>>> [ 25.227915] RAX: 0000000000000000 RBX: ff11000022f22a50 RCX: ffffffff822028de
>>> [ 25.228357] RDX: ff110000139a8000 RSI: ffffffff822028fe RDI: 0000000000000005
>>> [ 25.228840] RBP: ff11000023707c30 R08: 0000000000000001 R09: ffe21c00024e24eb
>>> [ 25.229284] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
>>> [ 25.229712] R13: ff11000012712000 R14: ff11000022f22ad0 R15: ff1100006c1aa440
>>> [ 25.230168] FS: 00007f1d418a7800(0000) GS:ff1100006c400000(0000) knlGS:0000000000000000
>>> [ 25.230666] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 25.230818] EXT4-fs (sda): Inode 151593 (000000004419e1b8): i_reserved_data_blocks (1) not cleared!
>>> [ 25.231037] CR2: 00007f1d416b1ac0 CR3: 00000000140e4004 CR4: 0000000000771ef0
>>> [ 25.232104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 25.232546] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
>>> [ 25.233006] PKRU: 55555554
>>> [ 25.233184] Call Trace:
>>> [ 25.233348] <TASK>
>>> [ 25.233489] ? show_regs+0xa8/0xc0
>>> [ 25.233724] ? __warn+0xee/0x380
>>> [ 25.233953] ? report_bug+0x25e/0x4b0
>>> [ 25.234201] ? ext4_destroy_inode+0x1de/0x280
>>> [ 25.234485] ? report_bug+0x2cb/0x4b0
>>> [ 25.234729] ? ext4_destroy_inode+0x1de/0x280
>>> [ 25.235020] ? handle_bug+0xa2/0x130
>>> [ 25.235266] ? exc_invalid_op+0x3c/0x80
>>> [ 25.235513] ? asm_exc_invalid_op+0x1f/0x30
>>> [ 25.235786] ? ext4_destroy_inode+0x1be/0x280
>>> [ 25.236072] ? ext4_destroy_inode+0x1de/0x280
>>> [ 25.236356] ? ext4_destroy_inode+0x1de/0x280
>>> [ 25.236637] ? ext4_destroy_inode+0x1de/0x280
>>> [ 25.236949] ? __pfx_ext4_destroy_inode+0x10/0x10
>>> [ 25.237257] destroy_inode+0xd6/0x1d0
>>> [ 25.237507] evict+0x5a7/0x930
>>> [ 25.237708] ? lock_release+0x441/0x870
>>> [ 25.237975] ? do_raw_spin_lock+0x141/0x280
>>> [ 25.238246] ? __pfx_evict+0x10/0x10
>>> [ 25.238486] ? __pfx_lock_release+0x10/0x10
>>> [ 25.238757] ? lock_release+0x441/0x870
>>> [ 25.239015] ? lock_release+0x441/0x870
>>> [ 25.239266] ? do_raw_spin_unlock+0x15c/0x210
>>> [ 25.239552] iput.part.0+0x543/0x740
>>> [ 25.239788] ? __pfx_ext4_drop_inode+0x10/0x10
>>> [ 25.240081] iput+0x68/0x90
>>> [ 25.240265] do_unlinkat+0x5dc/0x730
>>> [ 25.240503] ? __pfx_do_unlinkat+0x10/0x10
>>> [ 25.240791] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
>>> [ 25.241149] ? strncpy_from_user+0x1ef/0x2e0
>>> [ 25.241436] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
>>> [ 25.241774] ? getname_flags.part.0+0x1d5/0x570
>>> [ 25.242459] __x64_sys_unlink+0xd1/0x120
>>> [ 25.242749] x64_sys_call+0x2014/0x20d0
>>> [ 25.243031] do_syscall_64+0x6d/0x140
>>> [ 25.243304] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> [ 25.243630] RIP: 0033:0x7f1d4163eb7b
>>> [ 25.243878] Code: f0 ff ff 73 01 c3 48 8b 0d a2 b2 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 b2 1b 00 f7 d8 64 89 01 48
>>> [ 25.245038] RSP: 002b:00007fffffa2ca48 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
>>> [ 25.245508] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1d4163eb7b
>>> [ 25.245966] RDX: 00007fffffa2ca60 RSI: 00007fffffa2caf0 RDI: 00007fffffa2caf0
>>> [ 25.246412] RBP: 00007fffffa2db30 R08: 0000000000000000 R09: 00007fffffa2c8e0
>>> [ 25.246872] R10: 00007f1d4160b208 R11: 0000000000000206 R12: 00007fffffa2dca8
>>> [ 25.247310] R13: 0000000000402e4b R14: 0000000000404e08 R15: 00007f1d418f2000
>>> [ 25.247759] </TASK>
>>> [ 25.247912] irq event stamp: 5719
>>> [ 25.248127] hardirqs last enabled at (5727): [<ffffffff81458eb4>] console_unlock+0x224/0x240
>>> [ 25.248690] hardirqs last disabled at (5736): [<ffffffff81458e99>] console_unlock+0x209/0x240
>>> [ 25.249236] softirqs last enabled at (5252): [<ffffffff81289d19>] __irq_exit_rcu+0xa9/0x120
>>> [ 25.249768] softirqs last disabled at (5247): [<ffffffff81289d19>] __irq_exit_rcu+0xa9/0x120
>>> [ 25.250311] ---[ end trace 0000000000000000 ]---
>>> [ 25.250602] EXT4-fs (sda): Inode 151586 (00000000f9d6a315): i_reserved_data_blocks (1) not cleared!
>>> [ 25.326263] EXT4-fs (sda): Inode 151578 (00000000d86ad2f9): i_reserved_data_blocks (1) not cleared!
>>> [ 25.680884] EXT4-fs (sda): Inode 151596 (00000000da9177c9): i_reserved_data_blocks (1) not cleared!
>>> [ 25.717550] EXT4-fs (sda): Inode 151573 (0000000088687caa): i_reserved_data_blocks (1) not cleared!
>>> [ 25.726089] EXT4-fs (sda): Inode 151585 (000000005d7aed9a): i_reserved_data_blocks (1) not cleared!
>>> [ 25.838592] EXT4-fs (sda): Inode 151573 (000000004af622df): i_reserved_data_blocks (1) not cleared!
>>> [ 25.955073] EXT4-fs (sda): Inode 151598 (00000000a6e598ec): i_reserved_data_blocks (1) not cleared!
>>> [ 26.525552] EXT4-fs (sda): Inode 151593 (0000000026aef1cd): i_reserved_data_blocks (1) not cleared!
>>> [ 26.554067] EXT4-fs (sda): Inode 151591 (0000000051e990da): i_reserved_data_blocks (1) not cleared!
>>> [ 30.291490] EXT4-fs: 14 callbacks suppressed
>>> [ 30.291510] EXT4-fs (sda): Inode 151591 (0000000050be254a): i_reserved_data_blocks (1) not cleared!
>>> [ 30.301238] EXT4-fs (sda): Inode 151587 (000000004ba9ad70): i_reserved_data_blocks (1) not cleared!
>>> [ 30.414377] EXT4-fs (sda): Inode 151583 (00000000f6751ad3): i_reserved_data_blocks (1) not cleared!
>>> [ 30.417213] EXT4-fs (sda): Inode 151591 (0000000090a0dce3): i_reserved_data_blocks (1) not cleared!
>>> [ 30.537920] EXT4-fs (sda): Inode 151587 (00000000de72acf9): i_reserved_data_blocks (1) not cleared!
>>> [ 30.645791] EXT4-fs (sda): Inode 151580 (00000000a40a052f): i_reserved_data_blocks (1) not cleared!
>>> [ 30.665732] EXT4-fs (sda): Inode 151587 (00000000d9452edd): i_reserved_data_blocks (1) not cleared!
>>> [ 30.670204] EXT4-fs (sda): Inode 151597 (00000000f861d75f): i_reserved_data_blocks (1) not cleared!
>>> [ 31.964931] EXT4-fs (sda): Inode 151589 (000000009baa4064): i_reserved_data_blocks (1) not cleared!
>>> [ 32.101343] EXT4-fs (sda): Inode 151598 (000000003fca6cd5): i_reserved_data_blocks (1) not cleared!
>>> "
>>>
>>> I hope you find it useful.
>>>
>>> Regards,
>>> Yi Lai
>>>
>>> ---
>>>
>>> If you don't need the following environment to reproduce the problem or if you
>>> already have one reproduced environment, please ignore the following information.
>>>
>>> How to reproduce:
>>> git clone https://gitlab.com/xupengfe/repro_vm_env.git
>>> cd repro_vm_env
>>> tar -xvf repro_vm_env.tar.gz
>>> cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
>>> // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
>>> // You could change the bzImage_xxx as you want
>>> // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
>>> You could use below command to log in, there is no password for root.
>>> ssh -p 10023 root@localhost
>>>
>>> After login vm(virtual machine) successfully, you could transfer reproduced
>>> binary to the vm by below way, and reproduce the problem in vm:
>>> gcc -pthread -o repro repro.c
>>> scp -P 10023 repro root@localhost:/root/
>>>
>>> Get the bzImage for target kernel:
>>> Please use target kconfig and copy it to kernel_src/.config
>>> make olddefconfig
>>> make -jx bzImage //x should equal or less than cpu num your pc has
>>>
>>> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>>>
>>> Tips:
>>> If you already have qemu-system-x86_64, please ignore below info.
>>> If you want to install qemu v7.1.0 version:
>>> git clone https://github.com/qemu/qemu.git
>>> cd qemu
>>> git checkout -f v7.1.0
>>> mkdir build
>>> cd build
>>> yum install -y ninja-build.x86_64
>>> yum -y install libslirp-devel.x86_64
>>> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
>>> make
>>> make install
>>>
>>> On Fri, May 17, 2024 at 08:39:58PM +0800, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> The per-inode i_reserved_data_blocks count the reserved delalloc blocks
>>>> in a regular file, it should be zero when destroying the file. The
>>>> per-fs s_dirtyclusters_counter count all reserved delalloc blocks in a
>>>> filesystem, it also should be zero when umounting the filesystem. Now we
>>>> have only an error message if the i_reserved_data_blocks is not zero,
>>>> which is unable to be simply captured, so add WARN_ON_ONCE to make it
>>>> more visable.
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>> ---
>>>> fs/ext4/super.c | 6 +++++-
>>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>>> index 044135796f2b..b68064c877e3 100644
>>>> --- a/fs/ext4/super.c
>>>> +++ b/fs/ext4/super.c
>>>> @@ -1343,6 +1343,9 @@ static void ext4_put_super(struct super_block *sb)
>>>>
>>>> ext4_group_desc_free(sbi);
>>>> ext4_flex_groups_free(sbi);
>>>> +
>>>> + WARN_ON_ONCE(!(sbi->s_mount_state & EXT4_ERROR_FS) &&
>>>> + percpu_counter_sum(&sbi->s_dirtyclusters_counter));
>>>> ext4_percpu_param_destroy(sbi);
>>>> #ifdef CONFIG_QUOTA
>>>> for (int i = 0; i < EXT4_MAXQUOTAS; i++)
>>>> @@ -1473,7 +1476,8 @@ static void ext4_destroy_inode(struct inode *inode)
>>>> dump_stack();
>>>> }
>>>>
>>>> - if (EXT4_I(inode)->i_reserved_data_blocks)
>>>> + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS) &&
>>>> + WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks))
>>>> ext4_msg(inode->i_sb, KERN_ERR,
>>>> "Inode %lu (%p): i_reserved_data_blocks (%u) not cleared!",
>>>> inode->i_ino, EXT4_I(inode),
>>>> --
>>>> 2.39.2
>>>>
>>>
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v5 04/10] ext4: trim delalloc extent
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
` (2 preceding siblings ...)
2024-05-17 12:39 ` [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive Zhang Yi
@ 2024-05-17 12:39 ` Zhang Yi
2024-05-17 12:40 ` [PATCH v5 05/10] ext4: drop iblock parameter Zhang Yi
` (6 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:39 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
In ext4_da_map_blocks(), we could find four kind of extents in the
extent status tree: hole, unwritten, written and delayed extent. Now we
only trim the map len if we found an unwritten extent or a written
extent. This is okay now since map->m_len is always set to one and we
always insert one delayed block at a time. But this will become isn't
okay for other two cases if ext4_insert_delayed_block() and
ext4_da_map_blocks() support inserting multiple map->len blocks later.
1. If we found a hole in the extent status tree which es->es_len is
shorter than the length we want to write, we should trim the
map->m_len to prevent adding extra delay more blocks than we
expected. For example, assume we write data [A, C) to a file that
contains a hole extent [A, B) and a written extent [B, D) in the
cache.
A B C D
before da write: ...hhhhhh|wwwwww....
Then we will get extent [A, B), we should trim map->m_len to B-A
before inserting new delalloc blocks, if not, the range [B, C) will
be duplicated.
2. If we found a delayed extent in the extent status tree which
es->es_len is shorter than the length we want to write, we should
trim the map->m_len to es->es_len and return directly since the front
part of this map has been delayed, we can't insert the delalloc
extent that contains the latter part in this round, we should return
the delayed length and the caller should increase the position and
call ext4_da_map_blocks() again. For example, assume we write data
[A, C) to a file that contains a delayed extent [A, B) in the cache.
A B C
before da write: ...dddddd|hhh....
Then we will get delayed extent [A, B), we should also trim map->m_len
to B-A and return, if not, we will incorrectly assume that the write
is complete and won't insert [B, C).
So we need to always trim the map->m_len if the found es->es_len in the
extent status tree is shorter than the map->m_len, prearing for
inserting a extent with multiple delalloc blocks. This patch only does a
pre-fix, the handle is crude and ext4_da_map_blocks() deserve a cleanup,
we will do that later.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6114ca79f464..fb76016e2ab7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1734,6 +1734,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
/* Lookup extent status tree firstly */
if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+ retval = es.es_len - (iblock - es.es_lblk);
+ if (retval > map->m_len)
+ retval = map->m_len;
+ map->m_len = retval;
+
if (ext4_es_is_hole(&es))
goto add_delayed;
@@ -1750,10 +1755,6 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
}
map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
- retval = es.es_len - (iblock - es.es_lblk);
- if (retval > map->m_len)
- retval = map->m_len;
- map->m_len = retval;
if (ext4_es_is_written(&es))
map->m_flags |= EXT4_MAP_MAPPED;
else if (ext4_es_is_unwritten(&es))
@@ -1790,6 +1791,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
* the extent status tree.
*/
if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+ retval = es.es_len - (iblock - es.es_lblk);
+ if (retval > map->m_len)
+ retval = map->m_len;
+ map->m_len = retval;
+
if (!ext4_es_is_hole(&es)) {
up_write(&EXT4_I(inode)->i_data_sem);
goto found;
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v5 05/10] ext4: drop iblock parameter
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
` (3 preceding siblings ...)
2024-05-17 12:39 ` [PATCH v5 04/10] ext4: trim delalloc extent Zhang Yi
@ 2024-05-17 12:40 ` Zhang Yi
2024-05-17 12:40 ` [PATCH v5 06/10] ext4: make ext4_es_insert_delayed_block() insert multi-blocks Zhang Yi
` (5 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:40 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
The start block of the delalloc extent to be inserted is equal to
map->m_lblk, just drop the duplicate iblock input parameter.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
fs/ext4/inode.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fb76016e2ab7..de157aebc306 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1712,8 +1712,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
* time. This function looks up the requested blocks and sets the
* buffer delay bit under the protection of i_data_sem.
*/
-static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
- struct ext4_map_blocks *map,
+static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
struct buffer_head *bh)
{
struct extent_status es;
@@ -1733,8 +1732,8 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
(unsigned long) map->m_lblk);
/* Lookup extent status tree firstly */
- if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
- retval = es.es_len - (iblock - es.es_lblk);
+ if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
+ retval = es.es_len - (map->m_lblk - es.es_lblk);
if (retval > map->m_len)
retval = map->m_len;
map->m_len = retval;
@@ -1754,7 +1753,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
return 0;
}
- map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
+ map->m_pblk = ext4_es_pblock(&es) + map->m_lblk - es.es_lblk;
if (ext4_es_is_written(&es))
map->m_flags |= EXT4_MAP_MAPPED;
else if (ext4_es_is_unwritten(&es))
@@ -1790,8 +1789,8 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
* is held in write mode, before inserting a new da entry in
* the extent status tree.
*/
- if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
- retval = es.es_len - (iblock - es.es_lblk);
+ if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
+ retval = es.es_len - (map->m_lblk - es.es_lblk);
if (retval > map->m_len)
retval = map->m_len;
map->m_len = retval;
@@ -1848,7 +1847,7 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
* preallocated blocks are unmapped but should treated
* the same as allocated blocks.
*/
- ret = ext4_da_map_blocks(inode, iblock, &map, bh);
+ ret = ext4_da_map_blocks(inode, &map, bh);
if (ret <= 0)
return ret;
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v5 06/10] ext4: make ext4_es_insert_delayed_block() insert multi-blocks
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
` (4 preceding siblings ...)
2024-05-17 12:40 ` [PATCH v5 05/10] ext4: drop iblock parameter Zhang Yi
@ 2024-05-17 12:40 ` Zhang Yi
2024-05-17 12:40 ` [PATCH v5 07/10] ext4: make ext4_da_reserve_space() reserve multi-clusters Zhang Yi
` (4 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:40 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
Rename ext4_es_insert_delayed_block() to ext4_es_insert_delayed_extent()
and pass length parameter to make it insert multiple delalloc blocks at
a time. For the case of bigalloc, split the allocated parameter to
lclu_allocated and end_allocated. lclu_allocated indicates the
allocation state of the cluster which is containing the lblk,
end_allocated indicates the allocation state of the extent end, clusters
in the middle of delay allocated extent must be unallocated.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/extents_status.c | 70 ++++++++++++++++++++++++++-----------
fs/ext4/extents_status.h | 5 +--
fs/ext4/inode.c | 2 +-
include/trace/events/ext4.h | 16 +++++----
4 files changed, 62 insertions(+), 31 deletions(-)
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 4a00e2f019d9..23caf1f028b0 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -2052,34 +2052,49 @@ bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk)
}
/*
- * ext4_es_insert_delayed_block - adds a delayed block to the extents status
- * tree, adding a pending reservation where
- * needed
+ * ext4_es_insert_delayed_extent - adds some delayed blocks to the extents
+ * status tree, adding a pending reservation
+ * where needed
*
* @inode - file containing the newly added block
- * @lblk - logical block to be added
- * @allocated - indicates whether a physical cluster has been allocated for
- * the logical cluster that contains the block
+ * @lblk - start logical block to be added
+ * @len - length of blocks to be added
+ * @lclu_allocated/end_allocated - indicates whether a physical cluster has
+ * been allocated for the logical cluster
+ * that contains the start/end block. Note that
+ * end_allocated should always be set to false
+ * if the start and the end block are in the
+ * same cluster
*/
-void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
- bool allocated)
+void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
+ ext4_lblk_t len, bool lclu_allocated,
+ bool end_allocated)
{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct extent_status newes;
+ ext4_lblk_t end = lblk + len - 1;
int err1 = 0, err2 = 0, err3 = 0;
struct extent_status *es1 = NULL;
struct extent_status *es2 = NULL;
- struct pending_reservation *pr = NULL;
+ struct pending_reservation *pr1 = NULL;
+ struct pending_reservation *pr2 = NULL;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return;
- es_debug("add [%u/1) delayed to extent status tree of inode %lu\n",
- lblk, inode->i_ino);
+ es_debug("add [%u/%u) delayed to extent status tree of inode %lu\n",
+ lblk, len, inode->i_ino);
+ if (!len)
+ return;
+
+ WARN_ON_ONCE((EXT4_B2C(sbi, lblk) == EXT4_B2C(sbi, end)) &&
+ end_allocated);
newes.es_lblk = lblk;
- newes.es_len = 1;
+ newes.es_len = len;
ext4_es_store_pblock_status(&newes, ~0, EXTENT_STATUS_DELAYED);
- trace_ext4_es_insert_delayed_block(inode, &newes, allocated);
+ trace_ext4_es_insert_delayed_extent(inode, &newes, lclu_allocated,
+ end_allocated);
ext4_es_insert_extent_check(inode, &newes);
@@ -2088,11 +2103,15 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
es1 = __es_alloc_extent(true);
if ((err1 || err2) && !es2)
es2 = __es_alloc_extent(true);
- if ((err1 || err2 || err3) && allocated && !pr)
- pr = __alloc_pending(true);
+ if (err1 || err2 || err3) {
+ if (lclu_allocated && !pr1)
+ pr1 = __alloc_pending(true);
+ if (end_allocated && !pr2)
+ pr2 = __alloc_pending(true);
+ }
write_lock(&EXT4_I(inode)->i_es_lock);
- err1 = __es_remove_extent(inode, lblk, lblk, NULL, es1);
+ err1 = __es_remove_extent(inode, lblk, end, NULL, es1);
if (err1 != 0)
goto error;
/* Free preallocated extent if it didn't get used. */
@@ -2112,13 +2131,22 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
es2 = NULL;
}
- if (allocated) {
- err3 = __insert_pending(inode, lblk, &pr);
+ if (lclu_allocated) {
+ err3 = __insert_pending(inode, lblk, &pr1);
if (err3 != 0)
goto error;
- if (pr) {
- __free_pending(pr);
- pr = NULL;
+ if (pr1) {
+ __free_pending(pr1);
+ pr1 = NULL;
+ }
+ }
+ if (end_allocated) {
+ err3 = __insert_pending(inode, end, &pr2);
+ if (err3 != 0)
+ goto error;
+ if (pr2) {
+ __free_pending(pr2);
+ pr2 = NULL;
}
}
error:
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index d9847a4a25db..3c8e2edee5d5 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -249,8 +249,9 @@ extern void ext4_exit_pending(void);
extern void ext4_init_pending_tree(struct ext4_pending_tree *tree);
extern void ext4_remove_pending(struct inode *inode, ext4_lblk_t lblk);
extern bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk);
-extern void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
- bool allocated);
+extern void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
+ ext4_lblk_t len, bool lclu_allocated,
+ bool end_allocated);
extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
extern void ext4_clear_inode_es(struct inode *inode);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index de157aebc306..f64fe8b873ce 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1702,7 +1702,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
}
}
- ext4_es_insert_delayed_block(inode, lblk, allocated);
+ ext4_es_insert_delayed_extent(inode, lblk, 1, allocated, false);
return 0;
}
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index a697f4b77162..6b41ac61310f 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2478,11 +2478,11 @@ TRACE_EVENT(ext4_es_shrink,
__entry->scan_time, __entry->nr_skipped, __entry->retried)
);
-TRACE_EVENT(ext4_es_insert_delayed_block,
+TRACE_EVENT(ext4_es_insert_delayed_extent,
TP_PROTO(struct inode *inode, struct extent_status *es,
- bool allocated),
+ bool lclu_allocated, bool end_allocated),
- TP_ARGS(inode, es, allocated),
+ TP_ARGS(inode, es, lclu_allocated, end_allocated),
TP_STRUCT__entry(
__field( dev_t, dev )
@@ -2491,7 +2491,8 @@ TRACE_EVENT(ext4_es_insert_delayed_block,
__field( ext4_lblk_t, len )
__field( ext4_fsblk_t, pblk )
__field( char, status )
- __field( bool, allocated )
+ __field( bool, lclu_allocated )
+ __field( bool, end_allocated )
),
TP_fast_assign(
@@ -2501,16 +2502,17 @@ TRACE_EVENT(ext4_es_insert_delayed_block,
__entry->len = es->es_len;
__entry->pblk = ext4_es_show_pblock(es);
__entry->status = ext4_es_status(es);
- __entry->allocated = allocated;
+ __entry->lclu_allocated = lclu_allocated;
+ __entry->end_allocated = end_allocated;
),
TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s "
- "allocated %d",
+ "allocated %d %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->lblk, __entry->len,
__entry->pblk, show_extent_status(__entry->status),
- __entry->allocated)
+ __entry->lclu_allocated, __entry->end_allocated)
);
/* fsmap traces */
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v5 07/10] ext4: make ext4_da_reserve_space() reserve multi-clusters
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
` (5 preceding siblings ...)
2024-05-17 12:40 ` [PATCH v5 06/10] ext4: make ext4_es_insert_delayed_block() insert multi-blocks Zhang Yi
@ 2024-05-17 12:40 ` Zhang Yi
2024-05-17 12:40 ` [PATCH v5 08/10] ext4: factor out a helper to check the cluster allocation state Zhang Yi
` (3 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:40 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
Add 'nr_resv' parameter to ext4_da_reserve_space(), which indicates the
number of clusters wants to reserve, make it reserve multiple clusters
at a time.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 18 +++++++++---------
include/trace/events/ext4.h | 10 ++++++----
2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f64fe8b873ce..0c52969654ac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1479,9 +1479,9 @@ static int ext4_journalled_write_end(struct file *file,
}
/*
- * Reserve space for a single cluster
+ * Reserve space for 'nr_resv' clusters
*/
-static int ext4_da_reserve_space(struct inode *inode)
+static int ext4_da_reserve_space(struct inode *inode, int nr_resv)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct ext4_inode_info *ei = EXT4_I(inode);
@@ -1492,18 +1492,18 @@ static int ext4_da_reserve_space(struct inode *inode)
* us from metadata over-estimation, though we may go over by
* a small amount in the end. Here we just reserve for data.
*/
- ret = dquot_reserve_block(inode, EXT4_C2B(sbi, 1));
+ ret = dquot_reserve_block(inode, EXT4_C2B(sbi, nr_resv));
if (ret)
return ret;
spin_lock(&ei->i_block_reservation_lock);
- if (ext4_claim_free_clusters(sbi, 1, 0)) {
+ if (ext4_claim_free_clusters(sbi, nr_resv, 0)) {
spin_unlock(&ei->i_block_reservation_lock);
- dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
+ dquot_release_reservation_block(inode, EXT4_C2B(sbi, nr_resv));
return -ENOSPC;
}
- ei->i_reserved_data_blocks++;
- trace_ext4_da_reserve_space(inode);
+ ei->i_reserved_data_blocks += nr_resv;
+ trace_ext4_da_reserve_space(inode, nr_resv);
spin_unlock(&ei->i_block_reservation_lock);
return 0; /* success */
@@ -1678,7 +1678,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
* extents status tree doesn't get a match.
*/
if (sbi->s_cluster_ratio == 1) {
- ret = ext4_da_reserve_space(inode);
+ ret = ext4_da_reserve_space(inode, 1);
if (ret != 0) /* ENOSPC */
return ret;
} else { /* bigalloc */
@@ -1690,7 +1690,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
if (ret < 0)
return ret;
if (ret == 0) {
- ret = ext4_da_reserve_space(inode);
+ ret = ext4_da_reserve_space(inode, 1);
if (ret != 0) /* ENOSPC */
return ret;
} else {
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 6b41ac61310f..cc5e9b7b2b44 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1246,14 +1246,15 @@ TRACE_EVENT(ext4_da_update_reserve_space,
);
TRACE_EVENT(ext4_da_reserve_space,
- TP_PROTO(struct inode *inode),
+ TP_PROTO(struct inode *inode, int nr_resv),
- TP_ARGS(inode),
+ TP_ARGS(inode, nr_resv),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( __u64, i_blocks )
+ __field( int, reserve_blocks )
__field( int, reserved_data_blocks )
__field( __u16, mode )
),
@@ -1262,16 +1263,17 @@ TRACE_EVENT(ext4_da_reserve_space,
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->i_blocks = inode->i_blocks;
+ __entry->reserve_blocks = nr_resv;
__entry->reserved_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
__entry->mode = inode->i_mode;
),
- TP_printk("dev %d,%d ino %lu mode 0%o i_blocks %llu "
+ TP_printk("dev %d,%d ino %lu mode 0%o i_blocks %llu reserve_blocks %d"
"reserved_data_blocks %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->mode, __entry->i_blocks,
- __entry->reserved_data_blocks)
+ __entry->reserve_blocks, __entry->reserved_data_blocks)
);
TRACE_EVENT(ext4_da_release_space,
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v5 08/10] ext4: factor out a helper to check the cluster allocation state
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
` (6 preceding siblings ...)
2024-05-17 12:40 ` [PATCH v5 07/10] ext4: make ext4_da_reserve_space() reserve multi-clusters Zhang Yi
@ 2024-05-17 12:40 ` Zhang Yi
2024-05-20 9:37 ` Jan Kara
2024-05-17 12:40 ` [PATCH v5 09/10] ext4: make ext4_insert_delayed_block() insert multi-blocks Zhang Yi
` (2 subsequent siblings)
10 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:40 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
Factor out a common helper ext4_clu_alloc_state(), check whether the
cluster containing a delalloc block to be added has been allocated or
has delalloc reservation, no logic changes.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/inode.c | 55 ++++++++++++++++++++++++++++++++++---------------
1 file changed, 38 insertions(+), 17 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0c52969654ac..eefedb7264c7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1649,6 +1649,35 @@ static void ext4_print_free_blocks(struct inode *inode)
return;
}
+/*
+ * Check whether the cluster containing lblk has been allocated or has
+ * delalloc reservation.
+ *
+ * Returns 0 if the cluster doesn't have either, 1 if it has delalloc
+ * reservation, 2 if it's already been allocated, negative error code on
+ * failure.
+ */
+static int ext4_clu_alloc_state(struct inode *inode, ext4_lblk_t lblk)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ int ret;
+
+ /* Has delalloc reservation? */
+ if (ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
+ return 1;
+
+ /* Already been allocated? */
+ if (ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
+ return 2;
+ ret = ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk));
+ if (ret < 0)
+ return ret;
+ if (ret > 0)
+ return 2;
+
+ return 0;
+}
+
/*
* ext4_insert_delayed_block - adds a delayed block to the extents status
* tree, incrementing the reserved cluster/block
@@ -1682,23 +1711,15 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
if (ret != 0) /* ENOSPC */
return ret;
} else { /* bigalloc */
- if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) {
- if (!ext4_es_scan_clu(inode,
- &ext4_es_is_mapped, lblk)) {
- ret = ext4_clu_mapped(inode,
- EXT4_B2C(sbi, lblk));
- if (ret < 0)
- return ret;
- if (ret == 0) {
- ret = ext4_da_reserve_space(inode, 1);
- if (ret != 0) /* ENOSPC */
- return ret;
- } else {
- allocated = true;
- }
- } else {
- allocated = true;
- }
+ ret = ext4_clu_alloc_state(inode, lblk);
+ if (ret < 0)
+ return ret;
+ if (ret == 2)
+ allocated = true;
+ if (ret == 0) {
+ ret = ext4_da_reserve_space(inode, 1);
+ if (ret != 0) /* ENOSPC */
+ return ret;
}
}
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v5 08/10] ext4: factor out a helper to check the cluster allocation state
2024-05-17 12:40 ` [PATCH v5 08/10] ext4: factor out a helper to check the cluster allocation state Zhang Yi
@ 2024-05-20 9:37 ` Jan Kara
0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2024-05-20 9:37 UTC (permalink / raw)
To: Zhang Yi
Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3
On Fri 17-05-24 20:40:03, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
>
> Factor out a common helper ext4_clu_alloc_state(), check whether the
> cluster containing a delalloc block to be added has been allocated or
> has delalloc reservation, no logic changes.
>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/inode.c | 55 ++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 38 insertions(+), 17 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 0c52969654ac..eefedb7264c7 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1649,6 +1649,35 @@ static void ext4_print_free_blocks(struct inode *inode)
> return;
> }
>
> +/*
> + * Check whether the cluster containing lblk has been allocated or has
> + * delalloc reservation.
> + *
> + * Returns 0 if the cluster doesn't have either, 1 if it has delalloc
> + * reservation, 2 if it's already been allocated, negative error code on
> + * failure.
> + */
> +static int ext4_clu_alloc_state(struct inode *inode, ext4_lblk_t lblk)
> +{
> + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> + int ret;
> +
> + /* Has delalloc reservation? */
> + if (ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
> + return 1;
> +
> + /* Already been allocated? */
> + if (ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
> + return 2;
> + ret = ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk));
> + if (ret < 0)
> + return ret;
> + if (ret > 0)
> + return 2;
> +
> + return 0;
> +}
> +
> /*
> * ext4_insert_delayed_block - adds a delayed block to the extents status
> * tree, incrementing the reserved cluster/block
> @@ -1682,23 +1711,15 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> if (ret != 0) /* ENOSPC */
> return ret;
> } else { /* bigalloc */
> - if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) {
> - if (!ext4_es_scan_clu(inode,
> - &ext4_es_is_mapped, lblk)) {
> - ret = ext4_clu_mapped(inode,
> - EXT4_B2C(sbi, lblk));
> - if (ret < 0)
> - return ret;
> - if (ret == 0) {
> - ret = ext4_da_reserve_space(inode, 1);
> - if (ret != 0) /* ENOSPC */
> - return ret;
> - } else {
> - allocated = true;
> - }
> - } else {
> - allocated = true;
> - }
> + ret = ext4_clu_alloc_state(inode, lblk);
> + if (ret < 0)
> + return ret;
> + if (ret == 2)
> + allocated = true;
> + if (ret == 0) {
> + ret = ext4_da_reserve_space(inode, 1);
> + if (ret != 0) /* ENOSPC */
> + return ret;
> }
> }
>
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v5 09/10] ext4: make ext4_insert_delayed_block() insert multi-blocks
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
` (7 preceding siblings ...)
2024-05-17 12:40 ` [PATCH v5 08/10] ext4: factor out a helper to check the cluster allocation state Zhang Yi
@ 2024-05-17 12:40 ` Zhang Yi
2024-05-20 9:39 ` Jan Kara
2024-05-17 12:40 ` [PATCH v5 10/10] ext4: make ext4_da_map_blocks() buffer_head unaware Zhang Yi
2024-06-28 17:17 ` [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Theodore Ts'o
10 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:40 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
Rename ext4_insert_delayed_block() to ext4_insert_delayed_blocks(),
pass length parameter to make it insert multiple delalloc blocks at a
time. For non-bigalloc case, just reserve len blocks and insert delalloc
extent. For bigalloc case, we can ensure that the clusters in the middle
of a extent must be unallocated, we only need to check whether the start
and end clusters are delayed/allocated. We should subtract the space for
the start and/or end block(s) if they are allocated.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/inode.c | 51 ++++++++++++++++++++++++++++++++++---------------
1 file changed, 36 insertions(+), 15 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index eefedb7264c7..4febee4c833f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1679,24 +1679,29 @@ static int ext4_clu_alloc_state(struct inode *inode, ext4_lblk_t lblk)
}
/*
- * ext4_insert_delayed_block - adds a delayed block to the extents status
- * tree, incrementing the reserved cluster/block
- * count or making a pending reservation
- * where needed
+ * ext4_insert_delayed_blocks - adds a multiple delayed blocks to the extents
+ * status tree, incrementing the reserved
+ * cluster/block count or making pending
+ * reservations where needed
*
* @inode - file containing the newly added block
- * @lblk - logical block to be added
+ * @lblk - start logical block to be added
+ * @len - length of blocks to be added
*
* Returns 0 on success, negative error code on failure.
*/
-static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
+static int ext4_insert_delayed_blocks(struct inode *inode, ext4_lblk_t lblk,
+ ext4_lblk_t len)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int ret;
- bool allocated = false;
+ bool lclu_allocated = false;
+ bool end_allocated = false;
+ ext4_lblk_t resv_clu;
+ ext4_lblk_t end = lblk + len - 1;
/*
- * If the cluster containing lblk is shared with a delayed,
+ * If the cluster containing lblk or end is shared with a delayed,
* written, or unwritten extent in a bigalloc file system, it's
* already been accounted for and does not need to be reserved.
* A pending reservation must be made for the cluster if it's
@@ -1707,23 +1712,39 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
* extents status tree doesn't get a match.
*/
if (sbi->s_cluster_ratio == 1) {
- ret = ext4_da_reserve_space(inode, 1);
+ ret = ext4_da_reserve_space(inode, len);
if (ret != 0) /* ENOSPC */
return ret;
} else { /* bigalloc */
+ resv_clu = EXT4_B2C(sbi, end) - EXT4_B2C(sbi, lblk) + 1;
+
ret = ext4_clu_alloc_state(inode, lblk);
if (ret < 0)
return ret;
- if (ret == 2)
- allocated = true;
- if (ret == 0) {
- ret = ext4_da_reserve_space(inode, 1);
+ if (ret > 0) {
+ resv_clu--;
+ lclu_allocated = (ret == 2);
+ }
+
+ if (EXT4_B2C(sbi, lblk) != EXT4_B2C(sbi, end)) {
+ ret = ext4_clu_alloc_state(inode, end);
+ if (ret < 0)
+ return ret;
+ if (ret > 0) {
+ resv_clu--;
+ end_allocated = (ret == 2);
+ }
+ }
+
+ if (resv_clu) {
+ ret = ext4_da_reserve_space(inode, resv_clu);
if (ret != 0) /* ENOSPC */
return ret;
}
}
- ext4_es_insert_delayed_extent(inode, lblk, 1, allocated, false);
+ ext4_es_insert_delayed_extent(inode, lblk, len, lclu_allocated,
+ end_allocated);
return 0;
}
@@ -1828,7 +1849,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
}
}
- retval = ext4_insert_delayed_block(inode, map->m_lblk);
+ retval = ext4_insert_delayed_blocks(inode, map->m_lblk, map->m_len);
up_write(&EXT4_I(inode)->i_data_sem);
if (retval)
return retval;
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v5 09/10] ext4: make ext4_insert_delayed_block() insert multi-blocks
2024-05-17 12:40 ` [PATCH v5 09/10] ext4: make ext4_insert_delayed_block() insert multi-blocks Zhang Yi
@ 2024-05-20 9:39 ` Jan Kara
0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2024-05-20 9:39 UTC (permalink / raw)
To: Zhang Yi
Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3
On Fri 17-05-24 20:40:04, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
>
> Rename ext4_insert_delayed_block() to ext4_insert_delayed_blocks(),
> pass length parameter to make it insert multiple delalloc blocks at a
> time. For non-bigalloc case, just reserve len blocks and insert delalloc
> extent. For bigalloc case, we can ensure that the clusters in the middle
> of a extent must be unallocated, we only need to check whether the start
> and end clusters are delayed/allocated. We should subtract the space for
> the start and/or end block(s) if they are allocated.
>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/inode.c | 51 ++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 36 insertions(+), 15 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index eefedb7264c7..4febee4c833f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1679,24 +1679,29 @@ static int ext4_clu_alloc_state(struct inode *inode, ext4_lblk_t lblk)
> }
>
> /*
> - * ext4_insert_delayed_block - adds a delayed block to the extents status
> - * tree, incrementing the reserved cluster/block
> - * count or making a pending reservation
> - * where needed
> + * ext4_insert_delayed_blocks - adds a multiple delayed blocks to the extents
> + * status tree, incrementing the reserved
> + * cluster/block count or making pending
> + * reservations where needed
> *
> * @inode - file containing the newly added block
> - * @lblk - logical block to be added
> + * @lblk - start logical block to be added
> + * @len - length of blocks to be added
> *
> * Returns 0 on success, negative error code on failure.
> */
> -static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> +static int ext4_insert_delayed_blocks(struct inode *inode, ext4_lblk_t lblk,
> + ext4_lblk_t len)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> int ret;
> - bool allocated = false;
> + bool lclu_allocated = false;
> + bool end_allocated = false;
> + ext4_lblk_t resv_clu;
> + ext4_lblk_t end = lblk + len - 1;
>
> /*
> - * If the cluster containing lblk is shared with a delayed,
> + * If the cluster containing lblk or end is shared with a delayed,
> * written, or unwritten extent in a bigalloc file system, it's
> * already been accounted for and does not need to be reserved.
> * A pending reservation must be made for the cluster if it's
> @@ -1707,23 +1712,39 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> * extents status tree doesn't get a match.
> */
> if (sbi->s_cluster_ratio == 1) {
> - ret = ext4_da_reserve_space(inode, 1);
> + ret = ext4_da_reserve_space(inode, len);
> if (ret != 0) /* ENOSPC */
> return ret;
> } else { /* bigalloc */
> + resv_clu = EXT4_B2C(sbi, end) - EXT4_B2C(sbi, lblk) + 1;
> +
> ret = ext4_clu_alloc_state(inode, lblk);
> if (ret < 0)
> return ret;
> - if (ret == 2)
> - allocated = true;
> - if (ret == 0) {
> - ret = ext4_da_reserve_space(inode, 1);
> + if (ret > 0) {
> + resv_clu--;
> + lclu_allocated = (ret == 2);
> + }
> +
> + if (EXT4_B2C(sbi, lblk) != EXT4_B2C(sbi, end)) {
> + ret = ext4_clu_alloc_state(inode, end);
> + if (ret < 0)
> + return ret;
> + if (ret > 0) {
> + resv_clu--;
> + end_allocated = (ret == 2);
> + }
> + }
> +
> + if (resv_clu) {
> + ret = ext4_da_reserve_space(inode, resv_clu);
> if (ret != 0) /* ENOSPC */
> return ret;
> }
> }
>
> - ext4_es_insert_delayed_extent(inode, lblk, 1, allocated, false);
> + ext4_es_insert_delayed_extent(inode, lblk, len, lclu_allocated,
> + end_allocated);
> return 0;
> }
>
> @@ -1828,7 +1849,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
> }
> }
>
> - retval = ext4_insert_delayed_block(inode, map->m_lblk);
> + retval = ext4_insert_delayed_blocks(inode, map->m_lblk, map->m_len);
> up_write(&EXT4_I(inode)->i_data_sem);
> if (retval)
> return retval;
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v5 10/10] ext4: make ext4_da_map_blocks() buffer_head unaware
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
` (8 preceding siblings ...)
2024-05-17 12:40 ` [PATCH v5 09/10] ext4: make ext4_insert_delayed_block() insert multi-blocks Zhang Yi
@ 2024-05-17 12:40 ` Zhang Yi
2024-06-28 17:17 ` [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Theodore Ts'o
10 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2024-05-17 12:40 UTC (permalink / raw)
To: linux-ext4
Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3
From: Zhang Yi <yi.zhang@huawei.com>
After calling the ext4_da_map_blocks(), a delalloc extent state could
be identified through the EXT4_MAP_DELAYED flag in map. So factor out
buffer_head related handles in ext4_da_map_blocks(), make this function
buffer_head unaware and becomes a common helper, and also update the
stale function commtents, preparing for the iomap da write path in the
future.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 63 ++++++++++++++++++++++++-------------------------
1 file changed, 31 insertions(+), 32 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4febee4c833f..2afa3f83ddfb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1749,36 +1749,32 @@ static int ext4_insert_delayed_blocks(struct inode *inode, ext4_lblk_t lblk,
}
/*
- * This function is grabs code from the very beginning of
- * ext4_map_blocks, but assumes that the caller is from delayed write
- * time. This function looks up the requested blocks and sets the
- * buffer delay bit under the protection of i_data_sem.
+ * Looks up the requested blocks and sets the delalloc extent map.
+ * First try to look up for the extent entry that contains the requested
+ * blocks in the extent status tree without i_data_sem, then try to look
+ * up for the ondisk extent mapping with i_data_sem in read mode,
+ * finally hold i_data_sem in write mode, looks up again and add a
+ * delalloc extent entry if it still couldn't find any extent. Pass out
+ * the mapped extent through @map and return 0 on success.
*/
-static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
- struct buffer_head *bh)
+static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map)
{
struct extent_status es;
int retval;
- sector_t invalid_block = ~((sector_t) 0xffff);
#ifdef ES_AGGRESSIVE_TEST
struct ext4_map_blocks orig_map;
memcpy(&orig_map, map, sizeof(*map));
#endif
- if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
- invalid_block = ~0;
-
map->m_flags = 0;
ext_debug(inode, "max_blocks %u, logical block %lu\n", map->m_len,
(unsigned long) map->m_lblk);
/* Lookup extent status tree firstly */
if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
- retval = es.es_len - (map->m_lblk - es.es_lblk);
- if (retval > map->m_len)
- retval = map->m_len;
- map->m_len = retval;
+ map->m_len = min_t(unsigned int, map->m_len,
+ es.es_len - (map->m_lblk - es.es_lblk));
if (ext4_es_is_hole(&es))
goto add_delayed;
@@ -1788,10 +1784,8 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
* Delayed extent could be allocated by fallocate.
* So we need to check it.
*/
- if (ext4_es_is_delayed(&es) && !ext4_es_is_unwritten(&es)) {
- map_bh(bh, inode->i_sb, invalid_block);
- set_buffer_new(bh);
- set_buffer_delay(bh);
+ if (ext4_es_is_delonly(&es)) {
+ map->m_flags |= EXT4_MAP_DELAYED;
return 0;
}
@@ -1806,7 +1800,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
#ifdef ES_AGGRESSIVE_TEST
ext4_map_blocks_es_recheck(NULL, inode, map, &orig_map, 0);
#endif
- return retval;
+ return 0;
}
/*
@@ -1820,7 +1814,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
retval = ext4_map_query_blocks(NULL, inode, map);
up_read(&EXT4_I(inode)->i_data_sem);
if (retval)
- return retval;
+ return retval < 0 ? retval : 0;
add_delayed:
down_write(&EXT4_I(inode)->i_data_sem);
@@ -1832,10 +1826,8 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
* the extent status tree.
*/
if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
- retval = es.es_len - (map->m_lblk - es.es_lblk);
- if (retval > map->m_len)
- retval = map->m_len;
- map->m_len = retval;
+ map->m_len = min_t(unsigned int, map->m_len,
+ es.es_len - (map->m_lblk - es.es_lblk));
if (!ext4_es_is_hole(&es)) {
up_write(&EXT4_I(inode)->i_data_sem);
@@ -1845,18 +1837,14 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
retval = ext4_map_query_blocks(NULL, inode, map);
if (retval) {
up_write(&EXT4_I(inode)->i_data_sem);
- return retval;
+ return retval < 0 ? retval : 0;
}
}
+ map->m_flags |= EXT4_MAP_DELAYED;
retval = ext4_insert_delayed_blocks(inode, map->m_lblk, map->m_len);
up_write(&EXT4_I(inode)->i_data_sem);
- if (retval)
- return retval;
- map_bh(bh, inode->i_sb, invalid_block);
- set_buffer_new(bh);
- set_buffer_delay(bh);
return retval;
}
@@ -1876,11 +1864,15 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create)
{
struct ext4_map_blocks map;
+ sector_t invalid_block = ~((sector_t) 0xffff);
int ret = 0;
BUG_ON(create == 0);
BUG_ON(bh->b_size != inode->i_sb->s_blocksize);
+ if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
+ invalid_block = ~0;
+
map.m_lblk = iblock;
map.m_len = 1;
@@ -1889,10 +1881,17 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
* preallocated blocks are unmapped but should treated
* the same as allocated blocks.
*/
- ret = ext4_da_map_blocks(inode, &map, bh);
- if (ret <= 0)
+ ret = ext4_da_map_blocks(inode, &map);
+ if (ret < 0)
return ret;
+ if (map.m_flags & EXT4_MAP_DELAYED) {
+ map_bh(bh, inode->i_sb, invalid_block);
+ set_buffer_new(bh);
+ set_buffer_delay(bh);
+ return 0;
+ }
+
map_bh(bh, inode->i_sb, map.m_pblk);
ext4_update_bh_state(bh, map.m_flags);
--
2.39.2
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v5 00/10] ext4: support adding multi-delalloc blocks
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
` (9 preceding siblings ...)
2024-05-17 12:40 ` [PATCH v5 10/10] ext4: make ext4_da_map_blocks() buffer_head unaware Zhang Yi
@ 2024-06-28 17:17 ` Theodore Ts'o
10 siblings, 0 replies; 20+ messages in thread
From: Theodore Ts'o @ 2024-06-28 17:17 UTC (permalink / raw)
To: linux-ext4, Zhang Yi
Cc: Theodore Ts'o, linux-fsdevel, linux-kernel, adilger.kernel,
jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3
On Fri, 17 May 2024 20:39:55 +0800, Zhang Yi wrote:
> Changes since v4:
> - In patch 3, switch to check EXT4_ERROR_FS instead of
> ext4_forced_shutdown() to prevent warning on errors=continue mode as
> Jan suggested.
> - In patch 8, rename ext4_da_check_clu_allocated() to
> ext4_clu_alloc_state() and change the return value according to the
> cluster allocation state as Jan suggested.
> - In patch 9, do some appropriate logic changes since
> the ext4_clu_alloc_state() has been changed in patch 8, so I remove
> the reviewed-by tag from Jan, please take a look again.
>
> [...]
Applied, thanks!
[01/10] ext4: factor out a common helper to query extent map
commit: 8e4e5cdf2fdeb99445a468b6b6436ad79b9ecb30
[02/10] ext4: check the extent status again before inserting delalloc block
commit: 0ea6560abb3bac1ffcfa4bf6b2c4d344fdc27b3c
[03/10] ext4: warn if delalloc counters are not zero on inactive
commit: b37c907073e80016333b442c845c3acc198e570f
[04/10] ext4: trim delalloc extent
commit: 14a210c110d1ffbef4ed56e88e3c34de04790ff8
[05/10] ext4: drop iblock parameter
commit: bb6b18057f18e9b7f53a524721060652de151e8a
[06/10] ext4: make ext4_es_insert_delayed_block() insert multi-blocks
commit: 12eba993b94c9186e44a01f46e38b3ab3c635f2d
[07/10] ext4: make ext4_da_reserve_space() reserve multi-clusters
commit: 0d66b23d79c750276f791411d81a524549a64852
[08/10] ext4: factor out a helper to check the cluster allocation state
commit: 49bf6ab4d30b7a39d86a585e0a58f6c449d2e009
[09/10] ext4: make ext4_insert_delayed_block() insert multi-blocks
commit: 1850d76c1b781ad9c7dc3c4968fb40c1915231c0
[10/10] ext4: make ext4_da_map_blocks() buffer_head unaware
commit: 8262fe9a902c8a7b68c8521ebe18360a9145bada
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply [flat|nested] 20+ messages in thread