Re: [PATCH 2/7] generic/778: fix severe performance problems

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: zlang@redhat.com, fstests@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/7] generic/778: fix severe performance problems
Date: Mon, 1 Dec 2025 15:19:08 -0800	[thread overview]
Message-ID: <20251201231908.GB89492@frogsfrogsfrogs> (raw)
In-Reply-To: <aSq0OzvJHT1yOdvF@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>

On Sat, Nov 29, 2025 at 02:22:11PM +0530, Ojaswin Mujoo wrote:
> On Fri, Nov 14, 2025 at 06:32:51PM -0800, Darrick J. Wong wrote:
> > On Thu, Nov 13, 2025 at 05:23:45PM +0530, Ojaswin Mujoo wrote:
> > > On Mon, Nov 10, 2025 at 10:26:32AM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > This test takes 4800s to run, which is horrible.  AFAICT it starts out
> > > > by timing how much can be written atomically to a new file in 0.2
> > > > seconds, then scales up the file size by 3x.  On not very fast storage,
> > > 
> > > Hi Darrick,
> 
> (Sorry I missed this email somehow)
> 
> > > 
> > > So 250MB in 0.2s is like 1.2GBps which seems pretty fast. Did you mean
> > > "On fast storage ..." ?
> > 
> > No, I have even faster storage. ;)
> 
> :O
> 
> So that means on an even faster storage this problem would be even more
> visible because our file size would be >250MB
> 
> > 
> > > > this can result in file_size being set to ~250MB on a 4k fsblock
> > > > filesystem.  That's about 64,000 blocks.
> > > > 
> > > > The next thing this test does is try to create a file of that size
> > > > (250MB) of alternating written and unwritten blocks.  For some reason,
> > > > it sets up this file by invoking xfs_io 64,000 times to write small
> > > > amounts of data, which takes 3+ minutes on the author's system because
> > > > exec overhead is pretty high when you do that.
> > > 
> > > > 
> > > > As a result, one loop through the test takes almost 4 minutes.  The test
> > > > loops 20 times, so it runs for 80 minutes(!!) which is a really long
> > > > time.
> > > > 
> > > > So the first thing we do is observe that the giant slow loop is being
> > > > run as a single thread on an empty filesystem.  Most of the time the
> > > > allocator generates a mostly physically contiguous file.  We could
> > > > fallocate the whole file instead of fallocating one block every other
> > > > time through the loop.  This halves the setup time.
> > > > 
> > > > Next, we can also stuff the remaining pwrite commands into a bash array
> > > > and only invoke xfs_io once every 128x through the loop.  This amortizes
> > > > the xfs_io startup time, which reduces the test loop runtime to about 20
> > > > seconds.
> > > 
> > > Oh right, this is very bad. Weirdly I never noticed the test taking such
> > > a huge time while testing on scsi_debug and also on an enterprise SSD.
> > 
> > It doesn't help that xfs supports much larger awu_max than (say) ext4.
> 
> I did test on xfs as well. But yea maybe my SSD is just not fast enough.
> 
> > 
> > > Thanks for fixing this up though, I will start using maybe dm-delay
> > > while stressing the tests in the future to avoid such issues.
> > 
> > fork() is a bit expensive.
> > 
> > > > 
> > > > Finally, replace the 20x loop with a _soak_loop_running 5x loop because
> > > > 5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
> > > > SOAK_DURATION to get more intensive testing.  On my system this cuts the
> > > > runtime to 75 seconds.
> > > 
> > > So about the loops, we were running a modified version of this test,
> > > which used non atomic writes, to confirm if we are able to catch torn
> > > writes this way. We noticed that it sometimes took 10+ loops to observe
> > > the torn write. Hence we kept iters=20. Since catching a torn write is
> > > critical for working of atomic writes, I think it might make sense to
> > > leave it at 20. If we feel this is a very high value, we can perhaps
> > > remove -g auto and keep -g stress -g atomicwrites so only people who
> > > explicitly want to stress atomic writes will run it.
> > 
> > In that case we ought to limit the awu_max that we feed to the test
> > because otherwise it starts running a lot of IO.
> 
> Yes I think that makes sense. Right now we get awu_max of 4M on xfs that
> means we always end up only testing software atomic writes.  Maybe we
> can instead cap awu_max at 64K or something. This way, we can test both
> hw atomic writes (when device supports it) and sw atomic writes (when it
> doesn't)

Yeah, capping the testing block size sounds like a good idea.  What do
you think about using min(awu_max_opt * 2, awu_max) ?

--D

> Regards,
> ojaswin
> 
> > 
> > --D
> > 
> > > > 
> > > > Cc: <fstests@vger.kernel.org> # v2025.10.20
> > > > Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > ---
> > > >  tests/generic/778 |   59 ++++++++++++++++++++++++++++++++++++-----------------
> > > >  1 file changed, 40 insertions(+), 19 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/tests/generic/778 b/tests/generic/778
> > > > index 8cb1d8d4cad45d..7cfabc3a47a521 100755
> > > > --- a/tests/generic/778
> > > > +++ b/tests/generic/778
> > > > @@ -42,22 +42,28 @@ atomic_write_loop() {
> > > >  		# Due to sudden shutdown this can produce errors so just
> > > >  		# redirect them to seqres.full
> > > >  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> > > > -		echo "Written to offset: $off" >> $tmp.aw
> > > > -		off=$((off + $size))
> > > > +		echo "Written to offset: $((off + size))" >> $tmp.aw
> > > > +		off=$((off + size))
> > > >  	done
> > > >  }
> > > >  
> > > >  start_atomic_write_and_shutdown() {
> > > >  	atomic_write_loop &
> > > >  	awloop_pid=$!
> > > > +	local max_loops=100
> > > >  
> > > >  	local i=0
> > > > -	# Wait for at least first write to be recorded or 10s
> > > > -	while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
> > > > +	# Wait for at least first write to be recorded or too much time passes
> > > > +	while [ ! -f "$tmp.aw" -a $i -le $max_loops ]; do
> > > > +		i=$((i + 1))
> > > > +		sleep 0.2
> > > > +	done
> > > >  
> > > > -	if [[ $i -gt 50 ]]
> > > > +	cat $tmp.aw >> $seqres.full
> > > > +
> > > > +	if [[ $i -gt $max_loops ]]
> > > >  	then
> > > > -		_fail "atomic write process took too long to start"
> > > > +		_notrun "atomic write process took too long to start"
> > > >  	fi
> > > >  
> > > >  	echo >> $seqres.full
> > > > @@ -113,21 +119,34 @@ create_mixed_mappings() {
> > > >  	local off=0
> > > >  	local operations=("W" "U")
> > > >  
> > > > +	test $size_bytes -eq 0 && return
> > > > +
> > > > +	# fallocate the whole file once because preallocating single blocks
> > > > +	# with individual xfs_io invocations is really slow and the allocator
> > > > +	# usually gives out consecutive blocks anyway
> > > > +	$XFS_IO_PROG -f -c "falloc 0 $size_bytes" $file
> > > > +
> > > > +	local cmds=()
> > > >  	for ((i=0; i<$((size_bytes / blksz )); i++)); do
> > > > -		index=$(($i % ${#operations[@]}))
> > > > -		map="${operations[$index]}"
> > > > +		if (( i % 2 == 0 )); then
> > > > +			cmds+=(-c "pwrite -b $blksz $off $blksz")
> > > > +		fi
> > > > +
> > > > +		# batch the write commands into larger xfs_io invocations to
> > > > +		# amortize the fork overhead
> > > > +		if [ "${#cmds[@]}" -ge 128 ]; then
> > > > +			$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > > +			cmds=()
> > > > +		fi
> > > >  
> > > > -		case "$map" in
> > > > -		    "W")
> > > > -			$XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file  >> /dev/null
> > > > -			;;
> > > > -		    "U")
> > > > -			$XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
> > > > -			;;
> > > > -		esac
> > > >  		off=$((off + blksz))
> > > >  	done
> > > >  
> > > > +	if [ "${#cmds[@]}" -gt 0 ]; then
> > > > +		$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > > +		cmds=()
> > > > +	fi
> > > > +
> > > >  	sync $file
> > > >  }
> > > >  
> > > > @@ -336,9 +355,9 @@ echo >> $seqres.full
> > > >  echo "# Populating expected data buffers" >> $seqres.full
> > > >  populate_expected_data
> > > >  
> > > > -# Loop 20 times to shake out any races due to shutdown
> > > > -for ((iter=0; iter<20; iter++))
> > > > -do
> > > > +# Loop to shake out any races due to shutdown
> > > > +iter=0
> > > > +while _soak_loop_running $TIME_FACTOR; do
> > > >  	echo >> $seqres.full
> > > >  	echo "------ Iteration $iter ------" >> $seqres.full
> > > >  
> > > > @@ -361,6 +380,8 @@ do
> > > >  	echo >> $seqres.full
> > > >  	echo "# Starting shutdown torn write test for append atomic writes" >> $seqres.full
> > > >  	test_append_torn_write
> > > > +
> > > > +	iter=$((iter + 1))
> > > >  done
> > > >  
> > > >  echo "Silence is golden"
> > > > 
> > >

next prev parent reply	other threads:[~2025-12-01 23:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
2025-11-10 18:26 ` [PATCH 1/7] common: leave any breadcrumbs when _link_out_file_named can't find the output file Darrick J. Wong
2025-11-11  9:13   ` Christoph Hellwig
2025-11-10 18:26 ` [PATCH 2/7] generic/778: fix severe performance problems Darrick J. Wong
2025-11-11  9:27   ` Christoph Hellwig
2025-11-13 11:53   ` Ojaswin Mujoo
2025-11-15  2:32     ` Darrick J. Wong
2025-11-29  8:52       ` Ojaswin Mujoo
2025-12-01 23:19         ` Darrick J. Wong [this message]
2025-12-03  6:19           ` Ojaswin Mujoo
2025-12-04 17:18             ` Darrick J. Wong
2025-11-10 18:26 ` [PATCH 3/7] generic/778: fix background loop control with sentinel files Darrick J. Wong
2025-11-11  9:28   ` Christoph Hellwig
2025-11-13 11:06   ` Ojaswin Mujoo
2025-11-10 18:27 ` [PATCH 4/7] generic/019: skip test when there is no journal Darrick J. Wong
2025-11-11  9:28   ` Christoph Hellwig
2025-11-12 18:20     ` Darrick J. Wong
2025-11-10 18:27 ` [PATCH 5/7] xfs/837: fix test to work with pre-metadir quota mount options Darrick J. Wong
2025-11-11  9:29   ` Christoph Hellwig
2025-11-10 18:27 ` [PATCH 6/7] generic/774: reduce file size Darrick J. Wong
2025-11-11  9:13   ` John Garry
2025-11-11  9:29     ` Christoph Hellwig
2025-11-11  9:31   ` Christoph Hellwig
2025-11-12 18:05     ` Darrick J. Wong
2025-11-13 10:44   ` Ojaswin Mujoo
2025-11-10 18:27 ` [PATCH 7/7] generic/774: turn off lfsr Darrick J. Wong
2025-11-11  9:01   ` John Garry
2025-11-12 18:15     ` Darrick J. Wong
2025-11-11  9:32   ` Christoph Hellwig
2025-11-13 10:34   ` Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251201231908.GB89492@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox