* [RFC] generic/301: flaky failure on btrfs after metadata overcommit change
@ 2026-03-23 20:15 Leo Martins
2026-03-23 22:51 ` Darrick J. Wong
2026-03-24 3:03 ` Qu Wenruo
0 siblings, 2 replies; 3+ messages in thread
From: Leo Martins @ 2026-03-23 20:15 UTC (permalink / raw)
To: fstests; +Cc: linux-btrfs, Filipe Manana, Darrick J . Wong
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2414 bytes --]
Hi,
generic/301 has become flaky on btrfs after commit 0dc118b3c327 ("btrfs:
be less aggressive with metadata overcommit when we can do full
flushing") which landed in btrfs/for-next. Out of 30 runs, 8 fail with:
+file2 badly fragmented
I bisected this to the above commit, which reduces the metadata
overcommit limit from 1/8th to 1/64th of available space for
full-flushing contexts. This is a legitimate fix for -ENOSPC transaction
aborts on small filesystems, but as a side effect it causes more
frequent transaction commits during writeback. The reduced batching
means the extent allocator has less opportunity to coalesce adjacent CoW
extents, resulting in higher extent counts that sometimes cross the
test's threshold.
The fragmentation check in question is:
test $new_extents -lt $((internal_blks * 2 / 3)) || echo "file2 badly fragmented"
The 2/3 threshold was introduced in 9184ca155d7c ("xfs: test
fragmentation characteristics of copy-on-write") as part of a series
testing XFS's CoW extent size hint (cowextsize) mechanism. For btrfs,
this threshold is arbitrary — btrfs doesn't have XFS's cowextsize hint,
and its CoW extent allocation depends on factors like transaction commit
frequency and metadata reservation behavior, which is exactly what the
overcommit commit changed.
I see two possible fixes and would appreciate input on which is
preferred:
Option A: _notrun for btrfs
----------------------------
Skip the entire test since the fragmentation threshold is not applicable
to btrfs:
test $FSTYP = "btrfs" && \
_notrun "CoW fragmentation threshold not applicable to btrfs"
Option B: Skip only the extent count assertion for btrfs
---------------------------------------------------------
Keep the CoW + data integrity portion of the test (the md5sum checks
after random CoW writes and remount are still useful) and only skip the
fragmentation assertion:
if [ "$FSTYP" != "btrfs" ]; then
test $new_extents -lt $((internal_blks * 2 / 3)) || \
echo "file2 badly fragmented"
fi
I lean towards option B since the CoW write + remount + md5sum
verification is still a reasonable smoke test, but option A is cleaner
if the consensus is that this test isn't adding value for btrfs.
Thoughts?
Thanks,
Leo Martins
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] generic/301: flaky failure on btrfs after metadata overcommit change
2026-03-23 20:15 [RFC] generic/301: flaky failure on btrfs after metadata overcommit change Leo Martins
@ 2026-03-23 22:51 ` Darrick J. Wong
2026-03-24 3:03 ` Qu Wenruo
1 sibling, 0 replies; 3+ messages in thread
From: Darrick J. Wong @ 2026-03-23 22:51 UTC (permalink / raw)
To: Leo Martins; +Cc: fstests, linux-btrfs, Filipe Manana
On Mon, Mar 23, 2026 at 01:15:29PM -0700, Leo Martins wrote:
> Hi,
>
> generic/301 has become flaky on btrfs after commit 0dc118b3c327 ("btrfs:
> be less aggressive with metadata overcommit when we can do full
> flushing") which landed in btrfs/for-next. Out of 30 runs, 8 fail with:
>
> +file2 badly fragmented
>
> I bisected this to the above commit, which reduces the metadata
> overcommit limit from 1/8th to 1/64th of available space for
> full-flushing contexts. This is a legitimate fix for -ENOSPC transaction
> aborts on small filesystems, but as a side effect it causes more
> frequent transaction commits during writeback. The reduced batching
> means the extent allocator has less opportunity to coalesce adjacent CoW
> extents, resulting in higher extent counts that sometimes cross the
> test's threshold.
>
> The fragmentation check in question is:
>
> test $new_extents -lt $((internal_blks * 2 / 3)) || echo "file2 badly fragmented"
>
> The 2/3 threshold was introduced in 9184ca155d7c ("xfs: test
> fragmentation characteristics of copy-on-write") as part of a series
> testing XFS's CoW extent size hint (cowextsize) mechanism. For btrfs,
> this threshold is arbitrary — btrfs doesn't have XFS's cowextsize hint,
> and its CoW extent allocation depends on factors like transaction commit
> frequency and metadata reservation behavior, which is exactly what the
> overcommit commit changed.
>
> I see two possible fixes and would appreciate input on which is
> preferred:
>
> Option A: _notrun for btrfs
> ----------------------------
>
> Skip the entire test since the fragmentation threshold is not applicable
> to btrfs:
>
> test $FSTYP = "btrfs" && \
> _notrun "CoW fragmentation threshold not applicable to btrfs"
>
> Option B: Skip only the extent count assertion for btrfs
> ---------------------------------------------------------
>
> Keep the CoW + data integrity portion of the test (the md5sum checks
> after random CoW writes and remount are still useful) and only skip the
> fragmentation assertion:
>
> if [ "$FSTYP" != "btrfs" ]; then
> test $new_extents -lt $((internal_blks * 2 / 3)) || \
> echo "file2 badly fragmented"
> fi
>
> I lean towards option B since the CoW write + remount + md5sum
> verification is still a reasonable smoke test, but option A is cleaner
> if the consensus is that this test isn't adding value for btrfs.
>
> Thoughts?
B, since it's checking data integrity across a mount cycle.
--D
> Thanks,
> Leo Martins
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] generic/301: flaky failure on btrfs after metadata overcommit change
2026-03-23 20:15 [RFC] generic/301: flaky failure on btrfs after metadata overcommit change Leo Martins
2026-03-23 22:51 ` Darrick J. Wong
@ 2026-03-24 3:03 ` Qu Wenruo
1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2026-03-24 3:03 UTC (permalink / raw)
To: Leo Martins, fstests; +Cc: linux-btrfs, Filipe Manana, Darrick J . Wong
在 2026/3/24 06:45, Leo Martins 写道:
> Hi,
>
> generic/301 has become flaky on btrfs after commit 0dc118b3c327 ("btrfs:
> be less aggressive with metadata overcommit when we can do full
> flushing") which landed in btrfs/for-next. Out of 30 runs, 8 fail with:
>
> +file2 badly fragmented
>
> I bisected this to the above commit, which reduces the metadata
> overcommit limit from 1/8th to 1/64th of available space for
> full-flushing contexts. This is a legitimate fix for -ENOSPC transaction
> aborts on small filesystems, but as a side effect it causes more
> frequent transaction commits during writeback. The reduced batching
> means the extent allocator has less opportunity to coalesce adjacent CoW
> extents, resulting in higher extent counts that sometimes cross the
> test's threshold.
>
> The fragmentation check in question is:
>
> test $new_extents -lt $((internal_blks * 2 / 3)) || echo "file2 badly fragmented"
>
> The 2/3 threshold was introduced in 9184ca155d7c ("xfs: test
> fragmentation characteristics of copy-on-write") as part of a series
> testing XFS's CoW extent size hint (cowextsize) mechanism. For btrfs,
> this threshold is arbitrary — btrfs doesn't have XFS's cowextsize hint,
> and its CoW extent allocation depends on factors like transaction commit
> frequency and metadata reservation behavior, which is exactly what the
> overcommit commit changed.
>
> I see two possible fixes and would appreciate input on which is
> preferred:
>
> Option A: _notrun for btrfs
> ----------------------------
>
> Skip the entire test since the fragmentation threshold is not applicable
> to btrfs:
>
> test $FSTYP = "btrfs" && \
> _notrun "CoW fragmentation threshold not applicable to btrfs"
>
> Option B: Skip only the extent count assertion for btrfs
> ---------------------------------------------------------
>
> Keep the CoW + data integrity portion of the test (the md5sum checks
> after random CoW writes and remount are still useful) and only skip the
> fragmentation assertion:
>
> if [ "$FSTYP" != "btrfs" ]; then
> test $new_extents -lt $((internal_blks * 2 / 3)) || \
> echo "file2 badly fragmented"
> fi
>
> I lean towards option B since the CoW write + remount + md5sum
> verification is still a reasonable smoke test, but option A is cleaner
> if the consensus is that this test isn't adding value for btrfs.
I also agree on the option B.
The fragmentation behavior is really specific to each fs, thus leaving
the number of extents test to xfs, and keeping the contents check looks
good to me.
Thanks,
Qu
>
> Thoughts?
>
> Thanks,
> Leo Martins
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-24 3:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 20:15 [RFC] generic/301: flaky failure on btrfs after metadata overcommit change Leo Martins
2026-03-23 22:51 ` Darrick J. Wong
2026-03-24 3:03 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox