public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] generic/747: handle ENOSPC gracefully during write/delete cycles
@ 2026-03-24 23:33 Leo Martins
  2026-03-24 23:39 ` Leo Martins
  2026-03-25  6:23 ` Qu Wenruo
  0 siblings, 2 replies; 4+ messages in thread
From: Leo Martins @ 2026-03-24 23:33 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

generic/747 consistently fails on btrfs in my fstests setup, with an
~88% failure rate across multiple runs on kernels ranging from v6.9 to
v7.0-rc5. This is not a regression but a pre-existing issue since the
test was added.

The test fills a filesystem to 95% then does mixed write/delete cycles,
using statfs to decide whether to write or delete. However, statfs
f_bavail may overestimate the actual available space. On btrfs, the
statfs implementation documents its estimate as "a close approximation"
(fs/btrfs/super.c). At high fill levels the discrepancy between what
statfs reports and what the filesystem can actually allocate becomes
significant, causing dd to hit ENOSPC even though statfs indicated
there was room.

This is not a filesystem bug. The filesystem correctly rejects the
write when it cannot reserve space. The test's purpose is to stress
garbage collection through write/delete churn, not to validate space
accounting.

Handle ENOSPC by cleaning up the partial file and making room:

In _direct_fillup: break out of the fill loop (we're full enough).
In _mixed_write_delete: delete a file to free space and retry. If
writes fail 10 consecutive times, _fail the test as that indicates a
real filesystem issue rather than a transient statfs discrepancy.

Redirect dd stderr to seqres.full so errors are preserved for
debugging without polluting the expected output.

Signed-off-by: Leo Martins <loemra.dev@gmail.com>
---
 tests/generic/747 | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tests/generic/747 b/tests/generic/747
index 44834186..35de3ccb 100755
--- a/tests/generic/747
+++ b/tests/generic/747
@@ -35,11 +35,7 @@ _create_file() {
 
 	POSIXLY_CORRECT=yes dd if=/dev/zero of=${file_name} \
 		bs=${bs} count=$(( $file_sz / ${bs} )) \
-		status=none $dd_extra  2>&1
-
-	if [ $? -ne 0 ]; then
-		_fail "Failed writing $file_name"
-	fi
+		status=none $dd_extra  2>>$seqres.full
 }
 
 _total_M() {
@@ -69,7 +65,10 @@ _direct_fillup () {
 	while [ $(_used_percent) -lt $fill_percent ]; do
 		local fsz=$(_get_random_fsz)
 
-		_create_file $testseq $fsz "oflag=direct conv=fsync"
+		if ! _create_file $testseq $fsz "oflag=direct conv=fsync"; then
+			rm ${SCRATCH_MNT}/data_${testseq}
+			break
+		fi
 		testseq=$((${testseq} + 1))
 	done
 }
@@ -79,14 +78,25 @@ _mixed_write_delete() {
 	local total_M=$(_total_M)
 	local to_write_M=$(( ${overwrite_percentage} * ${total_M} / 100 ))
 	local written_M=0
+	local enospc_retries=0
+	local max_enospc_retries=10
 
 	while [ $written_M -lt $to_write_M ]; do
 		if [ $(_used_percent) -lt $fill_percent ]; then
 			local fsz=$(_get_random_fsz)
 
-			_create_file $testseq $fsz "$dd_extra"
-			written_M=$((${written_M} + ${fsz}/${M}))
-			testseq=$((${testseq} + 1))
+			if ! _create_file $testseq $fsz "$dd_extra"; then
+				rm ${SCRATCH_MNT}/data_${testseq}
+				_delete_random_file
+				enospc_retries=$((enospc_retries + 1))
+				if [ $enospc_retries -ge $max_enospc_retries ]; then
+					_fail "failed to write after $max_enospc_retries consecutive ENOSPC attempts"
+				fi
+			else
+				written_M=$((${written_M} + ${fsz}/${M}))
+				testseq=$((${testseq} + 1))
+				enospc_retries=0
+			fi
 		else
 			_delete_random_file
 		fi
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] generic/747: handle ENOSPC gracefully during write/delete cycles
  2026-03-24 23:33 [PATCH] generic/747: handle ENOSPC gracefully during write/delete cycles Leo Martins
@ 2026-03-24 23:39 ` Leo Martins
  2026-03-25  5:50   ` Christoph Hellwig
  2026-03-25  6:23 ` Qu Wenruo
  1 sibling, 1 reply; 4+ messages in thread
From: Leo Martins @ 2026-03-24 23:39 UTC (permalink / raw)
  To: Leo Martins; +Cc: linux-btrfs, kernel-team, fstests

On Tue, 24 Mar 2026 16:33:26 -0700 Leo Martins <loemra.dev@gmail.com> wrote:

Forgot to CC fstests mailing list.

> generic/747 consistently fails on btrfs in my fstests setup, with an
> ~88% failure rate across multiple runs on kernels ranging from v6.9 to
> v7.0-rc5. This is not a regression but a pre-existing issue since the
> test was added.
> 
> The test fills a filesystem to 95% then does mixed write/delete cycles,
> using statfs to decide whether to write or delete. However, statfs
> f_bavail may overestimate the actual available space. On btrfs, the
> statfs implementation documents its estimate as "a close approximation"
> (fs/btrfs/super.c). At high fill levels the discrepancy between what
> statfs reports and what the filesystem can actually allocate becomes
> significant, causing dd to hit ENOSPC even though statfs indicated
> there was room.
> 
> This is not a filesystem bug. The filesystem correctly rejects the
> write when it cannot reserve space. The test's purpose is to stress
> garbage collection through write/delete churn, not to validate space
> accounting.
> 
> Handle ENOSPC by cleaning up the partial file and making room:
> 
> In _direct_fillup: break out of the fill loop (we're full enough).
> In _mixed_write_delete: delete a file to free space and retry. If
> writes fail 10 consecutive times, _fail the test as that indicates a
> real filesystem issue rather than a transient statfs discrepancy.
> 
> Redirect dd stderr to seqres.full so errors are preserved for
> debugging without polluting the expected output.
> 
> Signed-off-by: Leo Martins <loemra.dev@gmail.com>
> ---
>  tests/generic/747 | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/tests/generic/747 b/tests/generic/747
> index 44834186..35de3ccb 100755
> --- a/tests/generic/747
> +++ b/tests/generic/747
> @@ -35,11 +35,7 @@ _create_file() {
>  
>  	POSIXLY_CORRECT=yes dd if=/dev/zero of=${file_name} \
>  		bs=${bs} count=$(( $file_sz / ${bs} )) \
> -		status=none $dd_extra  2>&1
> -
> -	if [ $? -ne 0 ]; then
> -		_fail "Failed writing $file_name"
> -	fi
> +		status=none $dd_extra  2>>$seqres.full
>  }
>  
>  _total_M() {
> @@ -69,7 +65,10 @@ _direct_fillup () {
>  	while [ $(_used_percent) -lt $fill_percent ]; do
>  		local fsz=$(_get_random_fsz)
>  
> -		_create_file $testseq $fsz "oflag=direct conv=fsync"
> +		if ! _create_file $testseq $fsz "oflag=direct conv=fsync"; then
> +			rm ${SCRATCH_MNT}/data_${testseq}
> +			break
> +		fi
>  		testseq=$((${testseq} + 1))
>  	done
>  }
> @@ -79,14 +78,25 @@ _mixed_write_delete() {
>  	local total_M=$(_total_M)
>  	local to_write_M=$(( ${overwrite_percentage} * ${total_M} / 100 ))
>  	local written_M=0
> +	local enospc_retries=0
> +	local max_enospc_retries=10
>  
>  	while [ $written_M -lt $to_write_M ]; do
>  		if [ $(_used_percent) -lt $fill_percent ]; then
>  			local fsz=$(_get_random_fsz)
>  
> -			_create_file $testseq $fsz "$dd_extra"
> -			written_M=$((${written_M} + ${fsz}/${M}))
> -			testseq=$((${testseq} + 1))
> +			if ! _create_file $testseq $fsz "$dd_extra"; then
> +				rm ${SCRATCH_MNT}/data_${testseq}
> +				_delete_random_file
> +				enospc_retries=$((enospc_retries + 1))
> +				if [ $enospc_retries -ge $max_enospc_retries ]; then
> +					_fail "failed to write after $max_enospc_retries consecutive ENOSPC attempts"
> +				fi
> +			else
> +				written_M=$((${written_M} + ${fsz}/${M}))
> +				testseq=$((${testseq} + 1))
> +				enospc_retries=0
> +			fi
>  		else
>  			_delete_random_file
>  		fi
> -- 
> 2.52.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] generic/747: handle ENOSPC gracefully during write/delete cycles
  2026-03-24 23:39 ` Leo Martins
@ 2026-03-25  5:50   ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2026-03-25  5:50 UTC (permalink / raw)
  To: Leo Martins; +Cc: linux-btrfs, kernel-team, fstests

On Tue, Mar 24, 2026 at 04:39:43PM -0700, Leo Martins wrote:
> > The test fills a filesystem to 95% then does mixed write/delete cycles,
> > using statfs to decide whether to write or delete. However, statfs
> > f_bavail may overestimate the actual available space. On btrfs, the
> > statfs implementation documents its estimate as "a close approximation"
> > (fs/btrfs/super.c). At high fill levels the discrepancy between what
> > statfs reports and what the filesystem can actually allocate becomes
> > significant, causing dd to hit ENOSPC even though statfs indicated
> > there was room.
> > This is not a filesystem bug. The filesystem correctly rejects the
> > write when it cannot reserve space. The test's purpose is to stress
> > garbage collection through write/delete churn, not to validate space
> > accounting.

f_bavail overestimating is a btrfs implementation bug that needs fixing.
It can be wrong, but it needs to be too low in doubt.  Without this
you're going to cause unexpected ENOSPC for real life applictions as
well (and btrfs does have a bit of a reputation for that, so fixing
this properly will benefit your users!)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] generic/747: handle ENOSPC gracefully during write/delete cycles
  2026-03-24 23:33 [PATCH] generic/747: handle ENOSPC gracefully during write/delete cycles Leo Martins
  2026-03-24 23:39 ` Leo Martins
@ 2026-03-25  6:23 ` Qu Wenruo
  1 sibling, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2026-03-25  6:23 UTC (permalink / raw)
  To: Leo Martins, linux-btrfs, kernel-team



在 2026/3/25 10:03, Leo Martins 写道:
> generic/747 consistently fails on btrfs in my fstests setup, with an
> ~88% failure rate across multiple runs on kernels ranging from v6.9 to
> v7.0-rc5. This is not a regression but a pre-existing issue since the
> test was added.

Glad we finally have someone trying to fix this.

> 
> The test fills a filesystem to 95% then does mixed write/delete cycles,
> using statfs to decide whether to write or delete. However, statfs
> f_bavail may overestimate the actual available space.

If this is the problem, then I believe we should fix our f_bavail 
calculation.

> On btrfs, the
> statfs implementation documents its estimate as "a close approximation"
> (fs/btrfs/super.c). At high fill levels the discrepancy between what
> statfs reports and what the filesystem can actually allocate becomes
> significant, causing dd to hit ENOSPC even though statfs indicated
> there was room.

 From what I can see, there are several different ways that f_bavail is 
overestimated:

- btrfs_calc_available_data_space()
   That functions goes through each device and try to find unallocated
   space.
   This can not handle uneven space (e.g. 1G + 10G RAID0), and it should
   go through the  per-profile estimation, which handles uneven disks way
   better.

   Normally it shouldn't affect single device, but we can still over-
   estimate by taking the reserved 1M bytes as available.

- total_free_data calculation
   It's just on-disk size minus the on-disk used space, which doesn't
   take things like reserved space into consideration.

   Thus I guess the problem is mostly from this part.

Mind to give this idea a try to fix the long failing g747?

Thanks,
Qu

> 
> This is not a filesystem bug. The filesystem correctly rejects the
> write when it cannot reserve space. The test's purpose is to stress
> garbage collection through write/delete churn, not to validate space
> accounting.
> 
> Handle ENOSPC by cleaning up the partial file and making room:
> 
> In _direct_fillup: break out of the fill loop (we're full enough).
> In _mixed_write_delete: delete a file to free space and retry. If
> writes fail 10 consecutive times, _fail the test as that indicates a
> real filesystem issue rather than a transient statfs discrepancy.
> 
> Redirect dd stderr to seqres.full so errors are preserved for
> debugging without polluting the expected output.
> 
> Signed-off-by: Leo Martins <loemra.dev@gmail.com>
> ---
>   tests/generic/747 | 28 +++++++++++++++++++---------
>   1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/tests/generic/747 b/tests/generic/747
> index 44834186..35de3ccb 100755
> --- a/tests/generic/747
> +++ b/tests/generic/747
> @@ -35,11 +35,7 @@ _create_file() {
>   
>   	POSIXLY_CORRECT=yes dd if=/dev/zero of=${file_name} \
>   		bs=${bs} count=$(( $file_sz / ${bs} )) \
> -		status=none $dd_extra  2>&1
> -
> -	if [ $? -ne 0 ]; then
> -		_fail "Failed writing $file_name"
> -	fi
> +		status=none $dd_extra  2>>$seqres.full
>   }
>   
>   _total_M() {
> @@ -69,7 +65,10 @@ _direct_fillup () {
>   	while [ $(_used_percent) -lt $fill_percent ]; do
>   		local fsz=$(_get_random_fsz)
>   
> -		_create_file $testseq $fsz "oflag=direct conv=fsync"
> +		if ! _create_file $testseq $fsz "oflag=direct conv=fsync"; then
> +			rm ${SCRATCH_MNT}/data_${testseq}
> +			break
> +		fi
>   		testseq=$((${testseq} + 1))
>   	done
>   }
> @@ -79,14 +78,25 @@ _mixed_write_delete() {
>   	local total_M=$(_total_M)
>   	local to_write_M=$(( ${overwrite_percentage} * ${total_M} / 100 ))
>   	local written_M=0
> +	local enospc_retries=0
> +	local max_enospc_retries=10
>   
>   	while [ $written_M -lt $to_write_M ]; do
>   		if [ $(_used_percent) -lt $fill_percent ]; then
>   			local fsz=$(_get_random_fsz)
>   
> -			_create_file $testseq $fsz "$dd_extra"
> -			written_M=$((${written_M} + ${fsz}/${M}))
> -			testseq=$((${testseq} + 1))
> +			if ! _create_file $testseq $fsz "$dd_extra"; then
> +				rm ${SCRATCH_MNT}/data_${testseq}
> +				_delete_random_file
> +				enospc_retries=$((enospc_retries + 1))
> +				if [ $enospc_retries -ge $max_enospc_retries ]; then
> +					_fail "failed to write after $max_enospc_retries consecutive ENOSPC attempts"
> +				fi
> +			else
> +				written_M=$((${written_M} + ${fsz}/${M}))
> +				testseq=$((${testseq} + 1))
> +				enospc_retries=0
> +			fi
>   		else
>   			_delete_random_file
>   		fi


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-25  6:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-24 23:33 [PATCH] generic/747: handle ENOSPC gracefully during write/delete cycles Leo Martins
2026-03-24 23:39 ` Leo Martins
2026-03-25  5:50   ` Christoph Hellwig
2026-03-25  6:23 ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox