[PATCH] generic/166: speed up on slow disks

public inbox for fstests@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] generic/166: speed up on slow disks
@ 2017-10-11 23:15 Dave Chinner
  2017-10-11 23:41 ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2017-10-11 23:15 UTC (permalink / raw)
  To: fstests

From: Dave Chinner <dchinner@redhat.com>

generic/166 is takes way too long to run on iscsi disks - over an *hour* on
flash based iscsi targets. In comparison, it takes 18s to run on a pmem device.

The issue is that it takes 3-4s per file write cycle on slow disks, and it does
a thousand write cycles. The problem is taht reflink is so much faster than the
write cycle that it's doing many more snapshots on slow disks than fast
disks, and this slows it down even more.

e.g. the pmem system that takes 18s to run does just under 1000 snapshots -
roughly one per file write. 20 minutes into the iscsi based test, it's only
done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs,
we're doing 30 snapshots a file write, not ~1.

Fix this by rate limiting snapshots to at most 1 per whole file write. This
reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem
device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced
runtime from 3671s to just 311s.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 tests/generic/166 | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tests/generic/166 b/tests/generic/166
index 8600a133f2d3..9b53307b761c 100755
--- a/tests/generic/166
+++ b/tests/generic/166
@@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1

 testdir=$SCRATCH_MNT/test-$seq
 finished_file=/tmp/finished
+do_snapshot=/tmp/snapshot
 rm -rf $finished_file
 mkdir $testdir

@@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full
 _scratch_cycle_mount

 # Snapshot creator...
+#
+# We rate limit the snapshot creator to one snapshot per full file write.  this
+# limits the runtime on slow devices, whilst not substantially reducing the the
+# number of snapshots taken on fast devices.
 snappy() {
 	n=0
 	while [ ! -e $finished_file ]; do
+		if [ ! -e $do_snapshot ]; then
+			sleep 0.01
+			continue;
+		fi
 		out="$(_cp_reflink $testdir/file1 $testdir/snap_$n 2>&1)"
 		res=$?
 		echo "$out" | grep -q "No space left" && break
 		test -n "$out" && echo "$out"
 		test $res -ne 0 && break
 		n=$((n + 1))
+		rm -f $do_snapshot
 	done
 }

@@ -84,6 +94,7 @@ echo "Snapshot a file undergoing directio rewrite"
 snappy &
 seq $nr_loops -1 0 | while read i; do
 	_pwrite_byte 0x63 $((i * blksz)) $blksz -d $testdir/file1 >> $seqres.full
+	touch $do_snapshot
 done
 touch $finished_file
 wait
-- 
2.15.0.rc0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] generic/166: speed up on slow disks
  2017-10-11 23:15 [PATCH] generic/166: speed up on slow disks Dave Chinner
@ 2017-10-11 23:41 ` Darrick J. Wong
  2017-10-11 23:58   ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2017-10-11 23:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests

On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> generic/166 is takes way too long to run on iscsi disks - over an *hour* on
> flash based iscsi targets. In comparison, it takes 18s to run on a pmem device.
> 
> The issue is that it takes 3-4s per file write cycle on slow disks, and it does
> a thousand write cycles. The problem is taht reflink is so much faster than the
> write cycle that it's doing many more snapshots on slow disks than fast
> disks, and this slows it down even more.
> 
> e.g. the pmem system that takes 18s to run does just under 1000 snapshots -
> roughly one per file write. 20 minutes into the iscsi based test, it's only
> done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs,
> we're doing 30 snapshots a file write, not ~1.
> 
> Fix this by rate limiting snapshots to at most 1 per whole file write. This
> reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem
> device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced
> runtime from 3671s to just 311s.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  tests/generic/166 | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/tests/generic/166 b/tests/generic/166
> index 8600a133f2d3..9b53307b761c 100755
> --- a/tests/generic/166
> +++ b/tests/generic/166
> @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1
>  
>  testdir=$SCRATCH_MNT/test-$seq
>  finished_file=/tmp/finished
> +do_snapshot=/tmp/snapshot
>  rm -rf $finished_file
>  mkdir $testdir
>  
> @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full
>  _scratch_cycle_mount
>  
>  # Snapshot creator...
> +#
> +# We rate limit the snapshot creator to one snapshot per full file write.  this
> +# limits the runtime on slow devices, whilst not substantially reducing the the
> +# number of snapshots taken on fast devices.

Hmm... but the whole point of this test is to try to get a simultaneous
reflink and dio rewrite (of the source file) to collide and crash the
system.  Adding $do_snapshot would bias the race towards the beginning
of the dio rewrite, whereas the goal was to try to make it happen
anywhere.

Granted, with proper locking it really doesn't matter... in the early
days of rmap/reflink it was useful for shaking bugs out of the rmap
code.

As an alternative, how about we add another helper to (sleep 120; touch
$timeout_file) and then teach both loops to (test -f $timeout_file &&
break)?

--D

>  snappy() {
>  	n=0
>  	while [ ! -e $finished_file ]; do
> +		if [ ! -e $do_snapshot ]; then
> +			sleep 0.01
> +			continue;
> +		fi
>  		out="$(_cp_reflink $testdir/file1 $testdir/snap_$n 2>&1)"
>  		res=$?
>  		echo "$out" | grep -q "No space left" && break
>  		test -n "$out" && echo "$out"
>  		test $res -ne 0 && break
>  		n=$((n + 1))
> +		rm -f $do_snapshot
>  	done
>  }
>  
> @@ -84,6 +94,7 @@ echo "Snapshot a file undergoing directio rewrite"
>  snappy &
>  seq $nr_loops -1 0 | while read i; do
>  	_pwrite_byte 0x63 $((i * blksz)) $blksz -d $testdir/file1 >> $seqres.full
> +	touch $do_snapshot
>  done
>  touch $finished_file
>  wait
> -- 
> 2.15.0.rc0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] generic/166: speed up on slow disks
  2017-10-11 23:41 ` Darrick J. Wong
@ 2017-10-11 23:58   ` Dave Chinner
  2017-10-12  1:32     ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2017-10-11 23:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: fstests

On Wed, Oct 11, 2017 at 04:41:27PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > generic/166 is takes way too long to run on iscsi disks - over an *hour* on
> > flash based iscsi targets. In comparison, it takes 18s to run on a pmem device.
> > 
> > The issue is that it takes 3-4s per file write cycle on slow disks, and it does
> > a thousand write cycles. The problem is taht reflink is so much faster than the
> > write cycle that it's doing many more snapshots on slow disks than fast
> > disks, and this slows it down even more.
> > 
> > e.g. the pmem system that takes 18s to run does just under 1000 snapshots -
> > roughly one per file write. 20 minutes into the iscsi based test, it's only
> > done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs,
> > we're doing 30 snapshots a file write, not ~1.
> > 
> > Fix this by rate limiting snapshots to at most 1 per whole file write. This
> > reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem
> > device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced
> > runtime from 3671s to just 311s.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> >  tests/generic/166 | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/tests/generic/166 b/tests/generic/166
> > index 8600a133f2d3..9b53307b761c 100755
> > --- a/tests/generic/166
> > +++ b/tests/generic/166
> > @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1
> >  
> >  testdir=$SCRATCH_MNT/test-$seq
> >  finished_file=/tmp/finished
> > +do_snapshot=/tmp/snapshot
> >  rm -rf $finished_file
> >  mkdir $testdir
> >  
> > @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full
> >  _scratch_cycle_mount
> >  
> >  # Snapshot creator...
> > +#
> > +# We rate limit the snapshot creator to one snapshot per full file write.  this
> > +# limits the runtime on slow devices, whilst not substantially reducing the the
> > +# number of snapshots taken on fast devices.
> 
> Hmm... but the whole point of this test is to try to get a simultaneous
> reflink and dio rewrite (of the source file) to collide and crash the
> system.  Adding $do_snapshot would bias the race towards the beginning
> of the dio rewrite, whereas the goal was to try to make it happen
> anywhere.

As we discussed on IRC, that's still happens on faster devices
because the sleep time is longer than the file write time. On slow
devices, I don't think it really matters because the reflink race
windows are so small compared to the IO times...

> Granted, with proper locking it really doesn't matter... in the early
> days of rmap/reflink it was useful for shaking bugs out of the rmap
> code.

*nod*

> As an alternative, how about we add another helper to (sleep 120; touch
> $timeout_file) and then teach both loops to (test -f $timeout_file &&
> break)?

Not fussed. Unless there's some pressing reason biasing the
write/reflink collision to the start of the file is a problem, then
I'm happy with the 5m runtime this patch results in on these
machines...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] generic/166: speed up on slow disks
  2017-10-11 23:58   ` Dave Chinner
@ 2017-10-12  1:32     ` Darrick J. Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2017-10-12  1:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests

On Thu, Oct 12, 2017 at 10:58:23AM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 04:41:27PM -0700, Darrick J. Wong wrote:
> > On Thu, Oct 12, 2017 at 10:15:44AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > generic/166 is takes way too long to run on iscsi disks - over an *hour* on
> > > flash based iscsi targets. In comparison, it takes 18s to run on a pmem device.
> > > 
> > > The issue is that it takes 3-4s per file write cycle on slow disks, and it does
> > > a thousand write cycles. The problem is taht reflink is so much faster than the
> > > write cycle that it's doing many more snapshots on slow disks than fast
> > > disks, and this slows it down even more.
> > > 
> > > e.g. the pmem system that takes 18s to run does just under 1000 snapshots -
> > > roughly one per file write. 20 minutes into the iscsi based test, it's only
> > > done ~300 write cycles but there are almost 10,000 snapshots been taken. IOWs,
> > > we're doing 30 snapshots a file write, not ~1.
> > > 
> > > Fix this by rate limiting snapshots to at most 1 per whole file write. This
> > > reduces the number of snapshots taken on fast devices by ~50% (runtime on pmem
> > > device went from 18s -> 8s) but reduced it to 1000 on slow devices and reduced
> > > runtime from 3671s to just 311s.
> > > 
> > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  tests/generic/166 | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/tests/generic/166 b/tests/generic/166
> > > index 8600a133f2d3..9b53307b761c 100755
> > > --- a/tests/generic/166
> > > +++ b/tests/generic/166
> > > @@ -55,6 +55,7 @@ _scratch_mount >> $seqres.full 2>&1
> > >  
> > >  testdir=$SCRATCH_MNT/test-$seq
> > >  finished_file=/tmp/finished
> > > +do_snapshot=/tmp/snapshot
> > >  rm -rf $finished_file
> > >  mkdir $testdir
> > >  
> > > @@ -68,15 +69,24 @@ _pwrite_byte 0x61 0 $((loops * blksz)) $testdir/file1 >> $seqres.full
> > >  _scratch_cycle_mount
> > >  
> > >  # Snapshot creator...
> > > +#
> > > +# We rate limit the snapshot creator to one snapshot per full file write.  this
> > > +# limits the runtime on slow devices, whilst not substantially reducing the the
> > > +# number of snapshots taken on fast devices.
> > 
> > Hmm... but the whole point of this test is to try to get a simultaneous
> > reflink and dio rewrite (of the source file) to collide and crash the
> > system.  Adding $do_snapshot would bias the race towards the beginning
> > of the dio rewrite, whereas the goal was to try to make it happen
> > anywhere.
> 
> As we discussed on IRC, that's still happens on faster devices
> because the sleep time is longer than the file write time. On slow
> devices, I don't think it really matters because the reflink race
> windows are so small compared to the IO times...
> 
> > Granted, with proper locking it really doesn't matter... in the early
> > days of rmap/reflink it was useful for shaking bugs out of the rmap
> > code.
> 
> *nod*
> 
> > As an alternative, how about we add another helper to (sleep 120; touch
> > $timeout_file) and then teach both loops to (test -f $timeout_file &&
> > break)?
> 
> Not fussed. Unless there's some pressing reason biasing the
> write/reflink collision to the start of the file is a problem, then
> I'm happy with the 5m runtime this patch results in on these
> machines...

Not really.  In theory the two files should both be locked during the
reflink so it probably doesn't matter exactly when they race...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-12  1:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-11 23:15 [PATCH] generic/166: speed up on slow disks Dave Chinner
2017-10-11 23:41 ` Darrick J. Wong
2017-10-11 23:58   ` Dave Chinner
2017-10-12  1:32     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox