public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] ext4: fix e4b bitmap inconsistency reports
@ 2026-01-06  9:08 Yongjian Sun
  2026-01-06 10:56 ` Jan Kara
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Yongjian Sun @ 2026-01-06  9:08 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, tytso, jack, yangerkun, yi.zhang, libaokun1,
	chengzhihao1, sunyongjian1

From: Yongjian Sun <sunyongjian1@huawei.com>

A bitmap inconsistency issue was observed during stress tests under
mixed huge-page workloads. Ext4 reported multiple e4b bitmap check
failures like:

ext4_mb_complex_scan_group:2508: group 350, 8179 free clusters as
per group info. But got 8192 blocks

Analysis and experimentation confirmed that the issue is caused by a
race condition between page migration and bitmap modification. Although
this timing window is extremely narrow, it is still hit in practice:

folio_lock                        ext4_mb_load_buddy
__migrate_folio
  check ref count
  folio_mc_copy                     __filemap_get_folio
                                      folio_try_get(folio)
                                  ......
                                  mb_mark_used
                                  ext4_mb_unload_buddy
  __folio_migrate_mapping
    folio_ref_freeze
folio_unlock

The root cause of this issue is that the fast path of load_buddy only
increments the folio's reference count, which is insufficient to prevent
concurrent folio migration. We observed that the folio migration process
acquires the folio lock. Therefore, we can determine whether to take the
fast path in load_buddy by checking the lock status. If the folio is
locked, we opt for the slow path (which acquires the lock) to close this
concurrency window.

Additionally, this change addresses the following issues:

When the DOUBLE_CHECK macro is enabled to inspect bitmap-related
issues, the following error may be triggered:

corruption in group 324 at byte 784(6272): f in copy != ff on
disk/prealloc

Analysis reveals that this is a false positive. There is a specific race
window where the bitmap and the group descriptor become momentarily
inconsistent, leading to this error report:

ext4_mb_load_buddy                   ext4_mb_load_buddy
  __filemap_get_folio(create|lock)
    folio_lock
  ext4_mb_init_cache
    folio_mark_uptodate
                                     __filemap_get_folio(no lock)
                                     ......
                                     mb_mark_used
                                       mb_mark_used_double
  mb_cmp_bitmaps
                                       mb_set_bits(e4b->bd_bitmap)
  folio_unlock

The original logic assumed that since mb_cmp_bitmaps is called when the
bitmap is newly loaded from disk, the folio lock would be sufficient to
prevent concurrent access. However, this overlooks a specific race
condition: if another process attempts to load buddy and finds the folio
is already in an uptodate state, it will immediately begin using it without
holding folio lock.

Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>
---
 fs/ext4/mballoc.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 56d50fd3310b..de4cacb740b3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1706,16 +1706,17 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
 
 	/* Avoid locking the folio in the fast path ... */
 	folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0);
-	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
+	if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) {
+		/*
+		 * folio_test_locked is employed to detect ongoing folio
+		 * migrations, since concurrent migrations can lead to
+		 * bitmap inconsistency. And if we are not uptodate that
+		 * implies somebody just created the folio but is yet to
+		 * initialize it. We can drop the folio reference and
+		 * try to get the folio with lock in both cases to avoid
+		 * concurrency.
+		 */
 		if (!IS_ERR(folio))
-			/*
-			 * drop the folio reference and try
-			 * to get the folio with lock. If we
-			 * are not uptodate that implies
-			 * somebody just created the folio but
-			 * is yet to initialize it. So
-			 * wait for it to initialize.
-			 */
 			folio_put(folio);
 		folio = __filemap_get_folio(inode->i_mapping, pnum,
 				FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
@@ -1764,7 +1765,7 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
 
 	/* we need another folio for the buddy */
 	folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0);
-	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
+	if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) {
 		if (!IS_ERR(folio))
 			folio_put(folio);
 		folio = __filemap_get_folio(inode->i_mapping, pnum,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] ext4: fix e4b bitmap inconsistency reports
  2026-01-06  9:08 [RFC PATCH] ext4: fix e4b bitmap inconsistency reports Yongjian Sun
@ 2026-01-06 10:56 ` Jan Kara
  2026-01-09  2:38 ` Baokun Li
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Jan Kara @ 2026-01-06 10:56 UTC (permalink / raw)
  To: sunyongjian1
  Cc: linux-ext4, linux-fsdevel, tytso, jack, yangerkun, yi.zhang,
	libaokun1, chengzhihao1

On Tue 06-01-26 17:08:20, Yongjian Sun wrote:
> From: Yongjian Sun <sunyongjian1@huawei.com>
> 
> A bitmap inconsistency issue was observed during stress tests under
> mixed huge-page workloads. Ext4 reported multiple e4b bitmap check
> failures like:
> 
> ext4_mb_complex_scan_group:2508: group 350, 8179 free clusters as
> per group info. But got 8192 blocks
> 
> Analysis and experimentation confirmed that the issue is caused by a
> race condition between page migration and bitmap modification. Although
> this timing window is extremely narrow, it is still hit in practice:
> 
> folio_lock                        ext4_mb_load_buddy
> __migrate_folio
>   check ref count
>   folio_mc_copy                     __filemap_get_folio
>                                       folio_try_get(folio)
>                                   ......
>                                   mb_mark_used
>                                   ext4_mb_unload_buddy
>   __folio_migrate_mapping
>     folio_ref_freeze
> folio_unlock
> 
> The root cause of this issue is that the fast path of load_buddy only
> increments the folio's reference count, which is insufficient to prevent
> concurrent folio migration. We observed that the folio migration process
> acquires the folio lock. Therefore, we can determine whether to take the
> fast path in load_buddy by checking the lock status. If the folio is
> locked, we opt for the slow path (which acquires the lock) to close this
> concurrency window.
> 
> Additionally, this change addresses the following issues:
> 
> When the DOUBLE_CHECK macro is enabled to inspect bitmap-related
> issues, the following error may be triggered:
> 
> corruption in group 324 at byte 784(6272): f in copy != ff on
> disk/prealloc
> 
> Analysis reveals that this is a false positive. There is a specific race
> window where the bitmap and the group descriptor become momentarily
> inconsistent, leading to this error report:
> 
> ext4_mb_load_buddy                   ext4_mb_load_buddy
>   __filemap_get_folio(create|lock)
>     folio_lock
>   ext4_mb_init_cache
>     folio_mark_uptodate
>                                      __filemap_get_folio(no lock)
>                                      ......
>                                      mb_mark_used
>                                        mb_mark_used_double
>   mb_cmp_bitmaps
>                                        mb_set_bits(e4b->bd_bitmap)
>   folio_unlock
> 
> The original logic assumed that since mb_cmp_bitmaps is called when the
> bitmap is newly loaded from disk, the folio lock would be sufficient to
> prevent concurrent access. However, this overlooks a specific race
> condition: if another process attempts to load buddy and finds the folio
> is already in an uptodate state, it will immediately begin using it without
> holding folio lock.
> 
> Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>

Nice catch! The fix looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..de4cacb740b3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1706,16 +1706,17 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
>  
>  	/* Avoid locking the folio in the fast path ... */
>  	folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0);
> -	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> +	if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) {
> +		/*
> +		 * folio_test_locked is employed to detect ongoing folio
> +		 * migrations, since concurrent migrations can lead to
> +		 * bitmap inconsistency. And if we are not uptodate that
> +		 * implies somebody just created the folio but is yet to
> +		 * initialize it. We can drop the folio reference and
> +		 * try to get the folio with lock in both cases to avoid
> +		 * concurrency.
> +		 */
>  		if (!IS_ERR(folio))
> -			/*
> -			 * drop the folio reference and try
> -			 * to get the folio with lock. If we
> -			 * are not uptodate that implies
> -			 * somebody just created the folio but
> -			 * is yet to initialize it. So
> -			 * wait for it to initialize.
> -			 */
>  			folio_put(folio);
>  		folio = __filemap_get_folio(inode->i_mapping, pnum,
>  				FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
> @@ -1764,7 +1765,7 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
>  
>  	/* we need another folio for the buddy */
>  	folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0);
> -	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> +	if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) {
>  		if (!IS_ERR(folio))
>  			folio_put(folio);
>  		folio = __filemap_get_folio(inode->i_mapping, pnum,
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] ext4: fix e4b bitmap inconsistency reports
  2026-01-06  9:08 [RFC PATCH] ext4: fix e4b bitmap inconsistency reports Yongjian Sun
  2026-01-06 10:56 ` Jan Kara
@ 2026-01-09  2:38 ` Baokun Li
  2026-01-09 10:33 ` Zhang Yi
  2026-01-28 18:05 ` Theodore Ts'o
  3 siblings, 0 replies; 5+ messages in thread
From: Baokun Li @ 2026-01-09  2:38 UTC (permalink / raw)
  To: sunyongjian1
  Cc: linux-ext4, linux-fsdevel, tytso, jack, yangerkun, yi.zhang,
	chengzhihao1

On 2026-01-06 17:08, Yongjian Sun wrote:
> From: Yongjian Sun <sunyongjian1@huawei.com>
>
> A bitmap inconsistency issue was observed during stress tests under
> mixed huge-page workloads. Ext4 reported multiple e4b bitmap check
> failures like:
>
> ext4_mb_complex_scan_group:2508: group 350, 8179 free clusters as
> per group info. But got 8192 blocks
>
> Analysis and experimentation confirmed that the issue is caused by a
> race condition between page migration and bitmap modification. Although
> this timing window is extremely narrow, it is still hit in practice:
>
> folio_lock                        ext4_mb_load_buddy
> __migrate_folio
>   check ref count
>   folio_mc_copy                     __filemap_get_folio
>                                       folio_try_get(folio)
>                                   ......
>                                   mb_mark_used
>                                   ext4_mb_unload_buddy
>   __folio_migrate_mapping
>     folio_ref_freeze
> folio_unlock
>
> The root cause of this issue is that the fast path of load_buddy only
> increments the folio's reference count, which is insufficient to prevent
> concurrent folio migration. We observed that the folio migration process
> acquires the folio lock. Therefore, we can determine whether to take the
> fast path in load_buddy by checking the lock status. If the folio is
> locked, we opt for the slow path (which acquires the lock) to close this
> concurrency window.
>
> Additionally, this change addresses the following issues:
>
> When the DOUBLE_CHECK macro is enabled to inspect bitmap-related
> issues, the following error may be triggered:
>
> corruption in group 324 at byte 784(6272): f in copy != ff on
> disk/prealloc
>
> Analysis reveals that this is a false positive. There is a specific race
> window where the bitmap and the group descriptor become momentarily
> inconsistent, leading to this error report:
>
> ext4_mb_load_buddy                   ext4_mb_load_buddy
>   __filemap_get_folio(create|lock)
>     folio_lock
>   ext4_mb_init_cache
>     folio_mark_uptodate
>                                      __filemap_get_folio(no lock)
>                                      ......
>                                      mb_mark_used
>                                        mb_mark_used_double
>   mb_cmp_bitmaps
>                                        mb_set_bits(e4b->bd_bitmap)
>   folio_unlock
>
> The original logic assumed that since mb_cmp_bitmaps is called when the
> bitmap is newly loaded from disk, the folio lock would be sufficient to
> prevent concurrent access. However, this overlooks a specific race
> condition: if another process attempts to load buddy and finds the folio
> is already in an uptodate state, it will immediately begin using it without
> holding folio lock.
>
> Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Baokun Li <libaokun1@huawei.com>

> ---
>  fs/ext4/mballoc.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..de4cacb740b3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1706,16 +1706,17 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
>  
>  	/* Avoid locking the folio in the fast path ... */
>  	folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0);
> -	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> +	if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) {
> +		/*
> +		 * folio_test_locked is employed to detect ongoing folio
> +		 * migrations, since concurrent migrations can lead to
> +		 * bitmap inconsistency. And if we are not uptodate that
> +		 * implies somebody just created the folio but is yet to
> +		 * initialize it. We can drop the folio reference and
> +		 * try to get the folio with lock in both cases to avoid
> +		 * concurrency.
> +		 */
>  		if (!IS_ERR(folio))
> -			/*
> -			 * drop the folio reference and try
> -			 * to get the folio with lock. If we
> -			 * are not uptodate that implies
> -			 * somebody just created the folio but
> -			 * is yet to initialize it. So
> -			 * wait for it to initialize.
> -			 */
>  			folio_put(folio);
>  		folio = __filemap_get_folio(inode->i_mapping, pnum,
>  				FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
> @@ -1764,7 +1765,7 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
>  
>  	/* we need another folio for the buddy */
>  	folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0);
> -	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> +	if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) {
>  		if (!IS_ERR(folio))
>  			folio_put(folio);
>  		folio = __filemap_get_folio(inode->i_mapping, pnum,



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] ext4: fix e4b bitmap inconsistency reports
  2026-01-06  9:08 [RFC PATCH] ext4: fix e4b bitmap inconsistency reports Yongjian Sun
  2026-01-06 10:56 ` Jan Kara
  2026-01-09  2:38 ` Baokun Li
@ 2026-01-09 10:33 ` Zhang Yi
  2026-01-28 18:05 ` Theodore Ts'o
  3 siblings, 0 replies; 5+ messages in thread
From: Zhang Yi @ 2026-01-09 10:33 UTC (permalink / raw)
  To: sunyongjian1, linux-ext4
  Cc: linux-fsdevel, tytso, jack, yangerkun, libaokun1, chengzhihao1

On 1/6/2026 5:08 PM, Yongjian Sun wrote:
> From: Yongjian Sun <sunyongjian1@huawei.com>
> 
> A bitmap inconsistency issue was observed during stress tests under
> mixed huge-page workloads. Ext4 reported multiple e4b bitmap check
> failures like:
> 
> ext4_mb_complex_scan_group:2508: group 350, 8179 free clusters as
> per group info. But got 8192 blocks
> 
> Analysis and experimentation confirmed that the issue is caused by a
> race condition between page migration and bitmap modification. Although
> this timing window is extremely narrow, it is still hit in practice:
> 
> folio_lock                        ext4_mb_load_buddy
> __migrate_folio
>   check ref count
>   folio_mc_copy                     __filemap_get_folio
>                                       folio_try_get(folio)
>                                   ......
>                                   mb_mark_used
>                                   ext4_mb_unload_buddy
>   __folio_migrate_mapping
>     folio_ref_freeze
> folio_unlock
> 
> The root cause of this issue is that the fast path of load_buddy only
> increments the folio's reference count, which is insufficient to prevent
> concurrent folio migration. We observed that the folio migration process
> acquires the folio lock. Therefore, we can determine whether to take the
> fast path in load_buddy by checking the lock status. If the folio is
> locked, we opt for the slow path (which acquires the lock) to close this
> concurrency window.
> 
> Additionally, this change addresses the following issues:
> 
> When the DOUBLE_CHECK macro is enabled to inspect bitmap-related
> issues, the following error may be triggered:
> 
> corruption in group 324 at byte 784(6272): f in copy != ff on
> disk/prealloc
> 
> Analysis reveals that this is a false positive. There is a specific race
> window where the bitmap and the group descriptor become momentarily
> inconsistent, leading to this error report:
> 
> ext4_mb_load_buddy                   ext4_mb_load_buddy
>   __filemap_get_folio(create|lock)
>     folio_lock
>   ext4_mb_init_cache
>     folio_mark_uptodate
>                                      __filemap_get_folio(no lock)
>                                      ......
>                                      mb_mark_used
>                                        mb_mark_used_double
>   mb_cmp_bitmaps
>                                        mb_set_bits(e4b->bd_bitmap)
>   folio_unlock
> 
> The original logic assumed that since mb_cmp_bitmaps is called when the
> bitmap is newly loaded from disk, the folio lock would be sufficient to
> prevent concurrent access. However, this overlooks a specific race
> condition: if another process attempts to load buddy and finds the folio
> is already in an uptodate state, it will immediately begin using it without
> holding folio lock.
> 
> Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>

Well done! This is a problem that has been hidden for a long time.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/mballoc.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..de4cacb740b3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1706,16 +1706,17 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
>  
>  	/* Avoid locking the folio in the fast path ... */
>  	folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0);
> -	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> +	if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) {
> +		/*
> +		 * folio_test_locked is employed to detect ongoing folio
> +		 * migrations, since concurrent migrations can lead to
> +		 * bitmap inconsistency. And if we are not uptodate that
> +		 * implies somebody just created the folio but is yet to
> +		 * initialize it. We can drop the folio reference and
> +		 * try to get the folio with lock in both cases to avoid
> +		 * concurrency.
> +		 */
>  		if (!IS_ERR(folio))
> -			/*
> -			 * drop the folio reference and try
> -			 * to get the folio with lock. If we
> -			 * are not uptodate that implies
> -			 * somebody just created the folio but
> -			 * is yet to initialize it. So
> -			 * wait for it to initialize.
> -			 */
>  			folio_put(folio);
>  		folio = __filemap_get_folio(inode->i_mapping, pnum,
>  				FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
> @@ -1764,7 +1765,7 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
>  
>  	/* we need another folio for the buddy */
>  	folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0);
> -	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
> +	if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) {
>  		if (!IS_ERR(folio))
>  			folio_put(folio);
>  		folio = __filemap_get_folio(inode->i_mapping, pnum,


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] ext4: fix e4b bitmap inconsistency reports
  2026-01-06  9:08 [RFC PATCH] ext4: fix e4b bitmap inconsistency reports Yongjian Sun
                   ` (2 preceding siblings ...)
  2026-01-09 10:33 ` Zhang Yi
@ 2026-01-28 18:05 ` Theodore Ts'o
  3 siblings, 0 replies; 5+ messages in thread
From: Theodore Ts'o @ 2026-01-28 18:05 UTC (permalink / raw)
  To: linux-ext4, Yongjian Sun
  Cc: Theodore Ts'o, linux-fsdevel, jack, yangerkun, yi.zhang,
	libaokun1, chengzhihao1, sunyongjian1


On Tue, 06 Jan 2026 17:08:20 +0800, Yongjian Sun wrote:
> A bitmap inconsistency issue was observed during stress tests under
> mixed huge-page workloads. Ext4 reported multiple e4b bitmap check
> failures like:
> 
> ext4_mb_complex_scan_group:2508: group 350, 8179 free clusters as
> per group info. But got 8192 blocks
> 
> [...]

Applied, thanks!

[1/1] ext4: fix e4b bitmap inconsistency reports
      commit: bdc56a9c46b2a99c12313122b9352b619a2e719e

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-01-28 18:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-06  9:08 [RFC PATCH] ext4: fix e4b bitmap inconsistency reports Yongjian Sun
2026-01-06 10:56 ` Jan Kara
2026-01-09  2:38 ` Baokun Li
2026-01-09 10:33 ` Zhang Yi
2026-01-28 18:05 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox