* [PATCH] tests/generic: test writepage cached mapping validity
@ 2017-10-26 14:48 Brian Foster
2017-10-26 15:34 ` Eryu Guan
0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2017-10-26 14:48 UTC (permalink / raw)
To: fstests; +Cc: linux-xfs
XFS has a bug where page writeback can end up sending data to the
wrong location due to a stale, cached file mapping. Add a test to
trigger this problem by racing background writeback with a
truncate/rewrite of the final page of the file.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
Here's a new version of the writepages test I previously posted as RFC.
This variant does not require an artificial delay to reproduce, so I've
dropped the need for the error injection tag.
I have been playing a bit with the file size and iteration count of the
test. I started with something that ran a decent bit longer (~2m) as was
necessary to reproduce on my dev/debug vm, but recently trimmed the file
size and iteration count to something that runs much quicker (~10s) and
reproduces nearly 100% of the time on my actual test hardware. The
tradeoff is the reproducibility is much lower on my debug vm (~20-25%
perhaps). The test still does reproduce when run over 10-15 iters, so I
opted for the quicker test.
In all, I am a bit curious about whether this reproduces reliably on
others' test setups. If not, does tweaking the size/iterations improve
the reproducibility?
Brian
v1:
- New test algorithm that does not require artificial delay.
- Created as generic test.
rfc: https://marc.info/?l=linux-xfs&m=150886719725497&w=2
tests/generic/999 | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/999.out | 2 ++
tests/generic/group | 1 +
3 files changed, 97 insertions(+)
create mode 100755 tests/generic/999
create mode 100644 tests/generic/999.out
diff --git a/tests/generic/999 b/tests/generic/999
new file mode 100755
index 0000000..9e56a1e
--- /dev/null
+++ b/tests/generic/999
@@ -0,0 +1,94 @@
+#! /bin/bash
+# FS QA Test 999
+#
+# Test XFS page writeback code for races with the cached file mapping. XFS
+# caches the file -> block mapping for a full extent once it is initially looked
+# up. The cached mapping is used for all subsequent pages in the same writeback
+# cycle that cover the associated extent. Under certain conditions, it is
+# possible for concurrent operations on the file to invalidate the cached
+# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
+# partly stale mapping and potentially leaving delalloc blocks in the current
+# mapping unconverted.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_test_program "feature"
+
+_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
+_scratch_mount || _fail "mount failed"
+
+file=$SCRATCH_MNT/file
+filesize=$((1024 * 1024 * 32))
+pagesize=`src/feature -s`
+truncsize=$((filesize - pagesize))
+
+for i in $(seq 0 15); do
+ # Truncate the file and fsync to persist the final size on-disk. This is
+ # required so the subsequent truncate will not wait on writeback.
+ $XFS_IO_PROG -fc "truncate 0" $file
+ $XFS_IO_PROG -c "truncate $filesize" -c fsync $file
+
+ # create a small enough delalloc extent to likely be contiguous
+ $XFS_IO_PROG -c "pwrite 0 $filesize" $file >> $seqres.full 2>&1
+
+ # Start writeback and a racing truncate and rewrite of the final page.
+ $XFS_IO_PROG -c "sync_range -w 0 0" $file &
+ sync_pid=$!
+ $XFS_IO_PROG -c "truncate $truncsize" \
+ -c "pwrite $truncsize $pagesize" $file >> $seqres.full 2>&1
+
+ # If the test fails, the most likely outcome is an sb_fdblocks mismatch
+ # and/or an associated delalloc assert failure on inode reclaim. Cycle
+ # the mount to trigger detection.
+ wait $sync_pid
+ _scratch_cycle_mount || _fail "mount failed"
+done
+
+echo Silence is golden
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/999.out b/tests/generic/999.out
new file mode 100644
index 0000000..3b276ca
--- /dev/null
+++ b/tests/generic/999.out
@@ -0,0 +1,2 @@
+QA output created by 999
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index fbe0a7f..89342da 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -468,3 +468,4 @@
463 auto quick clone dangerous
464 auto rw
465 auto rw quick aio
+999 auto quick
--
2.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] tests/generic: test writepage cached mapping validity
2017-10-26 14:48 [PATCH] tests/generic: test writepage cached mapping validity Brian Foster
@ 2017-10-26 15:34 ` Eryu Guan
2017-10-26 16:12 ` Brian Foster
0 siblings, 1 reply; 9+ messages in thread
From: Eryu Guan @ 2017-10-26 15:34 UTC (permalink / raw)
To: Brian Foster; +Cc: fstests, linux-xfs
On Thu, Oct 26, 2017 at 10:48:16AM -0400, Brian Foster wrote:
> XFS has a bug where page writeback can end up sending data to the
> wrong location due to a stale, cached file mapping. Add a test to
> trigger this problem by racing background writeback with a
> truncate/rewrite of the final page of the file.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Thanks a lot for the new test!
> ---
>
> Here's a new version of the writepages test I previously posted as RFC.
> This variant does not require an artificial delay to reproduce, so I've
> dropped the need for the error injection tag.
>
> I have been playing a bit with the file size and iteration count of the
> test. I started with something that ran a decent bit longer (~2m) as was
> necessary to reproduce on my dev/debug vm, but recently trimmed the file
> size and iteration count to something that runs much quicker (~10s) and
> reproduces nearly 100% of the time on my actual test hardware. The
> tradeoff is the reproducibility is much lower on my debug vm (~20-25%
> perhaps). The test still does reproduce when run over 10-15 iters, so I
> opted for the quicker test.
>
> In all, I am a bit curious about whether this reproduces reliably on
> others' test setups. If not, does tweaking the size/iterations improve
> the reproducibility?
On my test vm, with the default size/iteration numbers, the
reproducibility is around 40%, run time is 3s. Then I doubled the
ineration number, and it's 100% reproduced, run time is 7s.
On my real hardware, I have to double both file size and iteration
numbers to reproduce, reproducibility is ~20%, run time 35s.
Note that the vm is running v4.14-rc5 based 'xfs-4.14-fixes-7' tag from
Darric's tree and the real hardware is running v4.14-rc6.
Thanks,
Eryu
>
> Brian
>
> v1:
> - New test algorithm that does not require artificial delay.
> - Created as generic test.
> rfc: https://marc.info/?l=linux-xfs&m=150886719725497&w=2
>
> tests/generic/999 | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
> tests/generic/999.out | 2 ++
> tests/generic/group | 1 +
> 3 files changed, 97 insertions(+)
> create mode 100755 tests/generic/999
> create mode 100644 tests/generic/999.out
>
> diff --git a/tests/generic/999 b/tests/generic/999
> new file mode 100755
> index 0000000..9e56a1e
> --- /dev/null
> +++ b/tests/generic/999
> @@ -0,0 +1,94 @@
> +#! /bin/bash
> +# FS QA Test 999
> +#
> +# Test XFS page writeback code for races with the cached file mapping. XFS
> +# caches the file -> block mapping for a full extent once it is initially looked
> +# up. The cached mapping is used for all subsequent pages in the same writeback
> +# cycle that cover the associated extent. Under certain conditions, it is
> +# possible for concurrent operations on the file to invalidate the cached
> +# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
> +# partly stale mapping and potentially leaving delalloc blocks in the current
> +# mapping unconverted.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_test_program "feature"
> +
> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> +_scratch_mount || _fail "mount failed"
> +
> +file=$SCRATCH_MNT/file
> +filesize=$((1024 * 1024 * 32))
> +pagesize=`src/feature -s`
> +truncsize=$((filesize - pagesize))
> +
> +for i in $(seq 0 15); do
> + # Truncate the file and fsync to persist the final size on-disk. This is
> + # required so the subsequent truncate will not wait on writeback.
> + $XFS_IO_PROG -fc "truncate 0" $file
> + $XFS_IO_PROG -c "truncate $filesize" -c fsync $file
> +
> + # create a small enough delalloc extent to likely be contiguous
> + $XFS_IO_PROG -c "pwrite 0 $filesize" $file >> $seqres.full 2>&1
> +
> + # Start writeback and a racing truncate and rewrite of the final page.
> + $XFS_IO_PROG -c "sync_range -w 0 0" $file &
> + sync_pid=$!
> + $XFS_IO_PROG -c "truncate $truncsize" \
> + -c "pwrite $truncsize $pagesize" $file >> $seqres.full 2>&1
> +
> + # If the test fails, the most likely outcome is an sb_fdblocks mismatch
> + # and/or an associated delalloc assert failure on inode reclaim. Cycle
> + # the mount to trigger detection.
> + wait $sync_pid
> + _scratch_cycle_mount || _fail "mount failed"
> +done
> +
> +echo Silence is golden
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/999.out b/tests/generic/999.out
> new file mode 100644
> index 0000000..3b276ca
> --- /dev/null
> +++ b/tests/generic/999.out
> @@ -0,0 +1,2 @@
> +QA output created by 999
> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index fbe0a7f..89342da 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -468,3 +468,4 @@
> 463 auto quick clone dangerous
> 464 auto rw
> 465 auto rw quick aio
> +999 auto quick
> --
> 2.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] tests/generic: test writepage cached mapping validity
2017-10-26 15:34 ` Eryu Guan
@ 2017-10-26 16:12 ` Brian Foster
2017-10-26 16:40 ` Eryu Guan
0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2017-10-26 16:12 UTC (permalink / raw)
To: Eryu Guan; +Cc: fstests, linux-xfs
On Thu, Oct 26, 2017 at 11:34:02PM +0800, Eryu Guan wrote:
> On Thu, Oct 26, 2017 at 10:48:16AM -0400, Brian Foster wrote:
> > XFS has a bug where page writeback can end up sending data to the
> > wrong location due to a stale, cached file mapping. Add a test to
> > trigger this problem by racing background writeback with a
> > truncate/rewrite of the final page of the file.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
>
> Thanks a lot for the new test!
>
> > ---
> >
> > Here's a new version of the writepages test I previously posted as RFC.
> > This variant does not require an artificial delay to reproduce, so I've
> > dropped the need for the error injection tag.
> >
> > I have been playing a bit with the file size and iteration count of the
> > test. I started with something that ran a decent bit longer (~2m) as was
> > necessary to reproduce on my dev/debug vm, but recently trimmed the file
> > size and iteration count to something that runs much quicker (~10s) and
> > reproduces nearly 100% of the time on my actual test hardware. The
> > tradeoff is the reproducibility is much lower on my debug vm (~20-25%
> > perhaps). The test still does reproduce when run over 10-15 iters, so I
> > opted for the quicker test.
> >
> > In all, I am a bit curious about whether this reproduces reliably on
> > others' test setups. If not, does tweaking the size/iterations improve
> > the reproducibility?
>
> On my test vm, with the default size/iteration numbers, the
> reproducibility is around 40%, run time is 3s. Then I doubled the
> ineration number, and it's 100% reproduced, run time is 7s.
>
> On my real hardware, I have to double both file size and iteration
> numbers to reproduce, reproducibility is ~20%, run time 35s.
>
> Note that the vm is running v4.14-rc5 based 'xfs-4.14-fixes-7' tag from
> Darric's tree and the real hardware is running v4.14-rc6.
>
Thanks for testing this... It's interesting that you don't seem to
reproduce at all on the real hardware with the current values. What do
you have for storage on both of these setups? My VM is a slow, single
spindle while the hardware is also spinning rust but on a hardware raid.
If I run with 64MB, 32 iters, I'm at ~48 seconds on the VM. I can check
on bare metal as soon as the test run I have currently running
completes.
Brian
> Thanks,
> Eryu
>
> >
> > Brian
> >
> > v1:
> > - New test algorithm that does not require artificial delay.
> > - Created as generic test.
> > rfc: https://marc.info/?l=linux-xfs&m=150886719725497&w=2
> >
> > tests/generic/999 | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > tests/generic/999.out | 2 ++
> > tests/generic/group | 1 +
> > 3 files changed, 97 insertions(+)
> > create mode 100755 tests/generic/999
> > create mode 100644 tests/generic/999.out
> >
> > diff --git a/tests/generic/999 b/tests/generic/999
> > new file mode 100755
> > index 0000000..9e56a1e
> > --- /dev/null
> > +++ b/tests/generic/999
> > @@ -0,0 +1,94 @@
> > +#! /bin/bash
> > +# FS QA Test 999
> > +#
> > +# Test XFS page writeback code for races with the cached file mapping. XFS
> > +# caches the file -> block mapping for a full extent once it is initially looked
> > +# up. The cached mapping is used for all subsequent pages in the same writeback
> > +# cycle that cover the associated extent. Under certain conditions, it is
> > +# possible for concurrent operations on the file to invalidate the cached
> > +# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
> > +# partly stale mapping and potentially leaving delalloc blocks in the current
> > +# mapping unconverted.
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> > +#-----------------------------------------------------------------------
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1 # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > + cd /
> > + rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +
> > +# remove previous $seqres.full before test
> > +rm -f $seqres.full
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> > +_supported_os Linux
> > +_require_scratch
> > +_require_test_program "feature"
> > +
> > +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> > +_scratch_mount || _fail "mount failed"
> > +
> > +file=$SCRATCH_MNT/file
> > +filesize=$((1024 * 1024 * 32))
> > +pagesize=`src/feature -s`
> > +truncsize=$((filesize - pagesize))
> > +
> > +for i in $(seq 0 15); do
> > + # Truncate the file and fsync to persist the final size on-disk. This is
> > + # required so the subsequent truncate will not wait on writeback.
> > + $XFS_IO_PROG -fc "truncate 0" $file
> > + $XFS_IO_PROG -c "truncate $filesize" -c fsync $file
> > +
> > + # create a small enough delalloc extent to likely be contiguous
> > + $XFS_IO_PROG -c "pwrite 0 $filesize" $file >> $seqres.full 2>&1
> > +
> > + # Start writeback and a racing truncate and rewrite of the final page.
> > + $XFS_IO_PROG -c "sync_range -w 0 0" $file &
> > + sync_pid=$!
> > + $XFS_IO_PROG -c "truncate $truncsize" \
> > + -c "pwrite $truncsize $pagesize" $file >> $seqres.full 2>&1
> > +
> > + # If the test fails, the most likely outcome is an sb_fdblocks mismatch
> > + # and/or an associated delalloc assert failure on inode reclaim. Cycle
> > + # the mount to trigger detection.
> > + wait $sync_pid
> > + _scratch_cycle_mount || _fail "mount failed"
> > +done
> > +
> > +echo Silence is golden
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/999.out b/tests/generic/999.out
> > new file mode 100644
> > index 0000000..3b276ca
> > --- /dev/null
> > +++ b/tests/generic/999.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 999
> > +Silence is golden
> > diff --git a/tests/generic/group b/tests/generic/group
> > index fbe0a7f..89342da 100644
> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -468,3 +468,4 @@
> > 463 auto quick clone dangerous
> > 464 auto rw
> > 465 auto rw quick aio
> > +999 auto quick
> > --
> > 2.9.5
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] tests/generic: test writepage cached mapping validity
2017-10-26 16:12 ` Brian Foster
@ 2017-10-26 16:40 ` Eryu Guan
2017-10-26 17:17 ` Brian Foster
0 siblings, 1 reply; 9+ messages in thread
From: Eryu Guan @ 2017-10-26 16:40 UTC (permalink / raw)
To: Brian Foster; +Cc: fstests, linux-xfs
On Thu, Oct 26, 2017 at 12:12:47PM -0400, Brian Foster wrote:
> On Thu, Oct 26, 2017 at 11:34:02PM +0800, Eryu Guan wrote:
> > On Thu, Oct 26, 2017 at 10:48:16AM -0400, Brian Foster wrote:
> > > XFS has a bug where page writeback can end up sending data to the
> > > wrong location due to a stale, cached file mapping. Add a test to
> > > trigger this problem by racing background writeback with a
> > > truncate/rewrite of the final page of the file.
> > >
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> >
> > Thanks a lot for the new test!
> >
> > > ---
> > >
> > > Here's a new version of the writepages test I previously posted as RFC.
> > > This variant does not require an artificial delay to reproduce, so I've
> > > dropped the need for the error injection tag.
> > >
> > > I have been playing a bit with the file size and iteration count of the
> > > test. I started with something that ran a decent bit longer (~2m) as was
> > > necessary to reproduce on my dev/debug vm, but recently trimmed the file
> > > size and iteration count to something that runs much quicker (~10s) and
> > > reproduces nearly 100% of the time on my actual test hardware. The
> > > tradeoff is the reproducibility is much lower on my debug vm (~20-25%
> > > perhaps). The test still does reproduce when run over 10-15 iters, so I
> > > opted for the quicker test.
> > >
> > > In all, I am a bit curious about whether this reproduces reliably on
> > > others' test setups. If not, does tweaking the size/iterations improve
> > > the reproducibility?
> >
> > On my test vm, with the default size/iteration numbers, the
> > reproducibility is around 40%, run time is 3s. Then I doubled the
> > ineration number, and it's 100% reproduced, run time is 7s.
> >
> > On my real hardware, I have to double both file size and iteration
> > numbers to reproduce, reproducibility is ~20%, run time 35s.
> >
> > Note that the vm is running v4.14-rc5 based 'xfs-4.14-fixes-7' tag from
> > Darric's tree and the real hardware is running v4.14-rc6.
> >
>
> Thanks for testing this... It's interesting that you don't seem to
> reproduce at all on the real hardware with the current values. What do
> you have for storage on both of these setups? My VM is a slow, single
> spindle while the hardware is also spinning rust but on a hardware raid.
My vm is a kvm guest with 4 vcpus and 8G mem running on RHEL6 host, the
underlying storage hosting the OS image is hardware raid (HP smart
array). The real hardware is an IBM box with 8 logical cpus and 8G mem,
and 4 sata disks connected to MegaRAID, but configured as JBOD, I used
two partitions of one of the four disks.
Thanks,
Eryu
>
> If I run with 64MB, 32 iters, I'm at ~48 seconds on the VM. I can check
> on bare metal as soon as the test run I have currently running
> completes.
>
> Brian
>
> > Thanks,
> > Eryu
> >
> > >
> > > Brian
> > >
> > > v1:
> > > - New test algorithm that does not require artificial delay.
> > > - Created as generic test.
> > > rfc: https://marc.info/?l=linux-xfs&m=150886719725497&w=2
> > >
> > > tests/generic/999 | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > tests/generic/999.out | 2 ++
> > > tests/generic/group | 1 +
> > > 3 files changed, 97 insertions(+)
> > > create mode 100755 tests/generic/999
> > > create mode 100644 tests/generic/999.out
> > >
> > > diff --git a/tests/generic/999 b/tests/generic/999
> > > new file mode 100755
> > > index 0000000..9e56a1e
> > > --- /dev/null
> > > +++ b/tests/generic/999
> > > @@ -0,0 +1,94 @@
> > > +#! /bin/bash
> > > +# FS QA Test 999
> > > +#
> > > +# Test XFS page writeback code for races with the cached file mapping. XFS
> > > +# caches the file -> block mapping for a full extent once it is initially looked
> > > +# up. The cached mapping is used for all subsequent pages in the same writeback
> > > +# cycle that cover the associated extent. Under certain conditions, it is
> > > +# possible for concurrent operations on the file to invalidate the cached
> > > +# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
> > > +# partly stale mapping and potentially leaving delalloc blocks in the current
> > > +# mapping unconverted.
> > > +#
> > > +#-----------------------------------------------------------------------
> > > +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
> > > +#
> > > +# This program is free software; you can redistribute it and/or
> > > +# modify it under the terms of the GNU General Public License as
> > > +# published by the Free Software Foundation.
> > > +#
> > > +# This program is distributed in the hope that it would be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > > +# GNU General Public License for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public License
> > > +# along with this program; if not, write the Free Software Foundation,
> > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> > > +#-----------------------------------------------------------------------
> > > +#
> > > +
> > > +seq=`basename $0`
> > > +seqres=$RESULT_DIR/$seq
> > > +echo "QA output created by $seq"
> > > +
> > > +here=`pwd`
> > > +tmp=/tmp/$$
> > > +status=1 # failure is the default!
> > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > +
> > > +_cleanup()
> > > +{
> > > + cd /
> > > + rm -f $tmp.*
> > > +}
> > > +
> > > +# get standard environment, filters and checks
> > > +. ./common/rc
> > > +
> > > +# remove previous $seqres.full before test
> > > +rm -f $seqres.full
> > > +
> > > +# real QA test starts here
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs generic
> > > +_supported_os Linux
> > > +_require_scratch
> > > +_require_test_program "feature"
> > > +
> > > +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> > > +_scratch_mount || _fail "mount failed"
> > > +
> > > +file=$SCRATCH_MNT/file
> > > +filesize=$((1024 * 1024 * 32))
> > > +pagesize=`src/feature -s`
> > > +truncsize=$((filesize - pagesize))
> > > +
> > > +for i in $(seq 0 15); do
> > > + # Truncate the file and fsync to persist the final size on-disk. This is
> > > + # required so the subsequent truncate will not wait on writeback.
> > > + $XFS_IO_PROG -fc "truncate 0" $file
> > > + $XFS_IO_PROG -c "truncate $filesize" -c fsync $file
> > > +
> > > + # create a small enough delalloc extent to likely be contiguous
> > > + $XFS_IO_PROG -c "pwrite 0 $filesize" $file >> $seqres.full 2>&1
> > > +
> > > + # Start writeback and a racing truncate and rewrite of the final page.
> > > + $XFS_IO_PROG -c "sync_range -w 0 0" $file &
> > > + sync_pid=$!
> > > + $XFS_IO_PROG -c "truncate $truncsize" \
> > > + -c "pwrite $truncsize $pagesize" $file >> $seqres.full 2>&1
> > > +
> > > + # If the test fails, the most likely outcome is an sb_fdblocks mismatch
> > > + # and/or an associated delalloc assert failure on inode reclaim. Cycle
> > > + # the mount to trigger detection.
> > > + wait $sync_pid
> > > + _scratch_cycle_mount || _fail "mount failed"
> > > +done
> > > +
> > > +echo Silence is golden
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/999.out b/tests/generic/999.out
> > > new file mode 100644
> > > index 0000000..3b276ca
> > > --- /dev/null
> > > +++ b/tests/generic/999.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 999
> > > +Silence is golden
> > > diff --git a/tests/generic/group b/tests/generic/group
> > > index fbe0a7f..89342da 100644
> > > --- a/tests/generic/group
> > > +++ b/tests/generic/group
> > > @@ -468,3 +468,4 @@
> > > 463 auto quick clone dangerous
> > > 464 auto rw
> > > 465 auto rw quick aio
> > > +999 auto quick
> > > --
> > > 2.9.5
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] tests/generic: test writepage cached mapping validity
2017-10-26 16:40 ` Eryu Guan
@ 2017-10-26 17:17 ` Brian Foster
0 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2017-10-26 17:17 UTC (permalink / raw)
To: Eryu Guan; +Cc: fstests, linux-xfs
On Fri, Oct 27, 2017 at 12:40:17AM +0800, Eryu Guan wrote:
> On Thu, Oct 26, 2017 at 12:12:47PM -0400, Brian Foster wrote:
> > On Thu, Oct 26, 2017 at 11:34:02PM +0800, Eryu Guan wrote:
> > > On Thu, Oct 26, 2017 at 10:48:16AM -0400, Brian Foster wrote:
> > > > XFS has a bug where page writeback can end up sending data to the
> > > > wrong location due to a stale, cached file mapping. Add a test to
> > > > trigger this problem by racing background writeback with a
> > > > truncate/rewrite of the final page of the file.
> > > >
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > >
> > > Thanks a lot for the new test!
> > >
> > > > ---
> > > >
> > > > Here's a new version of the writepages test I previously posted as RFC.
> > > > This variant does not require an artificial delay to reproduce, so I've
> > > > dropped the need for the error injection tag.
> > > >
> > > > I have been playing a bit with the file size and iteration count of the
> > > > test. I started with something that ran a decent bit longer (~2m) as was
> > > > necessary to reproduce on my dev/debug vm, but recently trimmed the file
> > > > size and iteration count to something that runs much quicker (~10s) and
> > > > reproduces nearly 100% of the time on my actual test hardware. The
> > > > tradeoff is the reproducibility is much lower on my debug vm (~20-25%
> > > > perhaps). The test still does reproduce when run over 10-15 iters, so I
> > > > opted for the quicker test.
> > > >
> > > > In all, I am a bit curious about whether this reproduces reliably on
> > > > others' test setups. If not, does tweaking the size/iterations improve
> > > > the reproducibility?
> > >
> > > On my test vm, with the default size/iteration numbers, the
> > > reproducibility is around 40%, run time is 3s. Then I doubled the
> > > ineration number, and it's 100% reproduced, run time is 7s.
> > >
> > > On my real hardware, I have to double both file size and iteration
> > > numbers to reproduce, reproducibility is ~20%, run time 35s.
> > >
> > > Note that the vm is running v4.14-rc5 based 'xfs-4.14-fixes-7' tag from
> > > Darric's tree and the real hardware is running v4.14-rc6.
> > >
> >
> > Thanks for testing this... It's interesting that you don't seem to
> > reproduce at all on the real hardware with the current values. What do
> > you have for storage on both of these setups? My VM is a slow, single
> > spindle while the hardware is also spinning rust but on a hardware raid.
>
> My vm is a kvm guest with 4 vcpus and 8G mem running on RHEL6 host, the
> underlying storage hosting the OS image is hardware raid (HP smart
> array). The real hardware is an IBM box with 8 logical cpus and 8G mem,
> and 4 sata disks connected to MegaRAID, but configured as JBOD, I used
> two partitions of one of the four disks.
>
It's still only 15s or so on my test box with that change. I generally
don't do full xfstests runs on my vm anyways because it's so damn slow
;P. So I'm Ok with upping both the size and iters if that helps
reproducibility, but I'll wait a bit before posting v2 to see if anybody
else chimes in with more data.
I should note that I'm not terribly concerned with having 100%
reproducibility as opposed to something reasonable (i.e., 1 in 4 tries
is not so bad when we consider that some of our bugs have required
hundreds of iterations of certain tests to reproduce). Any prospective
fix should probably be tested against multiple iterations of this test.
Further, any regressions that may arise down the road would most likely
crop up as failures at some point between all of us running xfstests on
our various setups. Thanks again.
Brian
> Thanks,
> Eryu
>
> >
> > If I run with 64MB, 32 iters, I'm at ~48 seconds on the VM. I can check
> > on bare metal as soon as the test run I have currently running
> > completes.
> >
> > Brian
> >
> > > Thanks,
> > > Eryu
> > >
> > > >
> > > > Brian
> > > >
> > > > v1:
> > > > - New test algorithm that does not require artificial delay.
> > > > - Created as generic test.
> > > > rfc: https://marc.info/?l=linux-xfs&m=150886719725497&w=2
> > > >
> > > > tests/generic/999 | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > tests/generic/999.out | 2 ++
> > > > tests/generic/group | 1 +
> > > > 3 files changed, 97 insertions(+)
> > > > create mode 100755 tests/generic/999
> > > > create mode 100644 tests/generic/999.out
> > > >
> > > > diff --git a/tests/generic/999 b/tests/generic/999
> > > > new file mode 100755
> > > > index 0000000..9e56a1e
> > > > --- /dev/null
> > > > +++ b/tests/generic/999
> > > > @@ -0,0 +1,94 @@
> > > > +#! /bin/bash
> > > > +# FS QA Test 999
> > > > +#
> > > > +# Test XFS page writeback code for races with the cached file mapping. XFS
> > > > +# caches the file -> block mapping for a full extent once it is initially looked
> > > > +# up. The cached mapping is used for all subsequent pages in the same writeback
> > > > +# cycle that cover the associated extent. Under certain conditions, it is
> > > > +# possible for concurrent operations on the file to invalidate the cached
> > > > +# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
> > > > +# partly stale mapping and potentially leaving delalloc blocks in the current
> > > > +# mapping unconverted.
> > > > +#
> > > > +#-----------------------------------------------------------------------
> > > > +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
> > > > +#
> > > > +# This program is free software; you can redistribute it and/or
> > > > +# modify it under the terms of the GNU General Public License as
> > > > +# published by the Free Software Foundation.
> > > > +#
> > > > +# This program is distributed in the hope that it would be useful,
> > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > > > +# GNU General Public License for more details.
> > > > +#
> > > > +# You should have received a copy of the GNU General Public License
> > > > +# along with this program; if not, write the Free Software Foundation,
> > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> > > > +#-----------------------------------------------------------------------
> > > > +#
> > > > +
> > > > +seq=`basename $0`
> > > > +seqres=$RESULT_DIR/$seq
> > > > +echo "QA output created by $seq"
> > > > +
> > > > +here=`pwd`
> > > > +tmp=/tmp/$$
> > > > +status=1 # failure is the default!
> > > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > > +
> > > > +_cleanup()
> > > > +{
> > > > + cd /
> > > > + rm -f $tmp.*
> > > > +}
> > > > +
> > > > +# get standard environment, filters and checks
> > > > +. ./common/rc
> > > > +
> > > > +# remove previous $seqres.full before test
> > > > +rm -f $seqres.full
> > > > +
> > > > +# real QA test starts here
> > > > +
> > > > +# Modify as appropriate.
> > > > +_supported_fs generic
> > > > +_supported_os Linux
> > > > +_require_scratch
> > > > +_require_test_program "feature"
> > > > +
> > > > +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> > > > +_scratch_mount || _fail "mount failed"
> > > > +
> > > > +file=$SCRATCH_MNT/file
> > > > +filesize=$((1024 * 1024 * 32))
> > > > +pagesize=`src/feature -s`
> > > > +truncsize=$((filesize - pagesize))
> > > > +
> > > > +for i in $(seq 0 15); do
> > > > + # Truncate the file and fsync to persist the final size on-disk. This is
> > > > + # required so the subsequent truncate will not wait on writeback.
> > > > + $XFS_IO_PROG -fc "truncate 0" $file
> > > > + $XFS_IO_PROG -c "truncate $filesize" -c fsync $file
> > > > +
> > > > + # create a small enough delalloc extent to likely be contiguous
> > > > + $XFS_IO_PROG -c "pwrite 0 $filesize" $file >> $seqres.full 2>&1
> > > > +
> > > > + # Start writeback and a racing truncate and rewrite of the final page.
> > > > + $XFS_IO_PROG -c "sync_range -w 0 0" $file &
> > > > + sync_pid=$!
> > > > + $XFS_IO_PROG -c "truncate $truncsize" \
> > > > + -c "pwrite $truncsize $pagesize" $file >> $seqres.full 2>&1
> > > > +
> > > > + # If the test fails, the most likely outcome is an sb_fdblocks mismatch
> > > > + # and/or an associated delalloc assert failure on inode reclaim. Cycle
> > > > + # the mount to trigger detection.
> > > > + wait $sync_pid
> > > > + _scratch_cycle_mount || _fail "mount failed"
> > > > +done
> > > > +
> > > > +echo Silence is golden
> > > > +
> > > > +# success, all done
> > > > +status=0
> > > > +exit
> > > > diff --git a/tests/generic/999.out b/tests/generic/999.out
> > > > new file mode 100644
> > > > index 0000000..3b276ca
> > > > --- /dev/null
> > > > +++ b/tests/generic/999.out
> > > > @@ -0,0 +1,2 @@
> > > > +QA output created by 999
> > > > +Silence is golden
> > > > diff --git a/tests/generic/group b/tests/generic/group
> > > > index fbe0a7f..89342da 100644
> > > > --- a/tests/generic/group
> > > > +++ b/tests/generic/group
> > > > @@ -468,3 +468,4 @@
> > > > 463 auto quick clone dangerous
> > > > 464 auto rw
> > > > 465 auto rw quick aio
> > > > +999 auto quick
> > > > --
> > > > 2.9.5
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/4] xfs: properly invalidate cached writeback mapping
@ 2019-01-11 12:30 Brian Foster
2019-01-11 13:31 ` [PATCH] tests/generic: test writepage cached mapping validity Brian Foster
0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2019-01-11 12:30 UTC (permalink / raw)
To: linux-xfs
Hi all,
This series attempts to fix the stale writepage mapping problem in XFS.
The problem is essentially that ->writepages() caches the current extent
across multiple writepage instances and in certain circumstances the
cached mapping can be made invalid by concurrent filesystem operations.
For example, even with the current EOF trim band-aid for dealing with
post-eof speculative preallocation, a truncate+append sequence that
happens to race with background writeback can lead to a writepage to an
incorrect location.
Since we already have an xfs_ifork change/sequence number mechanism in
place, we reuse that to invalidate cached writeback mappings any time
the associated data fork has changed. Note that while certain workloads
might lead to a high frequency of spurious invalidations (i.e.,
with allocsize=4k mounts, files with a predetermined size such as vdisk
images, etc.), I've not been able to reproduce any noticeable effects at
a user level. See the patch 3 commit log description for further
discussion.
If we do run into use cases and workloads for which this is a problem, I
think there are options to further restrict seqno changing events (or
use multiple counters for subsets of change events) for less frequent
invalidations. For example, a sequence count that only tracks block
removals may still be sufficient to preserve coherency of cached
writeback mappings. Since this is all handwavy and theoretical, I opted
to keep the code simple and only deal with this should the need arise.
Patch 1 is a stable fix for the initial EOF trim patch. Patches 2-4
tweak the fork seqno mechanism to work for data forks, use it to
invalidate the cached writeback map and remove the EOF trim mechanism.
This has been tested via xfstests on multiple FSB sizes and fsx without
any explosions.
Thoughts, reviews, flames appreciated.
Brian
Brian Foster (4):
xfs: eof trim writeback mapping as soon as it is cached
xfs: update fork seq counter on data fork changes
xfs: validate writeback mapping using data fork seq counter
xfs: remove superfluous writeback mapping eof trimming
fs/xfs/libxfs/xfs_bmap.c | 11 -----------
fs/xfs/libxfs/xfs_bmap.h | 1 -
fs/xfs/libxfs/xfs_iext_tree.c | 13 ++++++-------
fs/xfs/libxfs/xfs_inode_fork.h | 2 +-
fs/xfs/xfs_aops.c | 21 ++++++---------------
fs/xfs/xfs_iomap.c | 4 ++--
6 files changed, 15 insertions(+), 37 deletions(-)
--
2.17.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] tests/generic: test writepage cached mapping validity
2019-01-11 12:30 [PATCH 0/4] xfs: properly invalidate cached writeback mapping Brian Foster
@ 2019-01-11 13:31 ` Brian Foster
2019-01-14 9:30 ` Eryu Guan
0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2019-01-11 13:31 UTC (permalink / raw)
To: fstests; +Cc: linux-xfs
XFS has a bug where page writeback can end up sending data to the
wrong location due to a stale, cached file mapping. Add a test to
trigger this problem by racing background writeback with a
truncate/rewrite of the final page of the file.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
Hi all,
This is a resend of an old post[1] that never quite made it upstream. It
wasn't a big deal at the time because we didn't really have a proper fix
for the problem. I'm resending now because there is a proposed fix[2].
I've verified that this still reproduces the problem and no longer fails
with the fix applied (in hundreds of iters). Note that reproduction may
require many iterations. It took me anywhere from 5 to 30 or so on the
box I tested, which I think is reasonable for the tradeoff of a fairly
quick test. There was some discussion on the original post around making
the test run longer for a more reliable reproducer, but I'm not sure how
valuable that is given this is a targeted regression test. Thoughts
appreciated.
Brian
[1] https://marc.info/?l=fstests&m=150902929900510&w=2
[2] https://marc.info/?l=linux-xfs&m=154721212321112&w=2
tests/generic/999 | 94 +++++++++++++++++++++++++++++++++++++++++++
tests/generic/999.out | 2 +
tests/generic/group | 1 +
3 files changed, 97 insertions(+)
create mode 100755 tests/generic/999
create mode 100644 tests/generic/999.out
diff --git a/tests/generic/999 b/tests/generic/999
new file mode 100755
index 00000000..9e56a1e0
--- /dev/null
+++ b/tests/generic/999
@@ -0,0 +1,94 @@
+#! /bin/bash
+# FS QA Test 999
+#
+# Test XFS page writeback code for races with the cached file mapping. XFS
+# caches the file -> block mapping for a full extent once it is initially looked
+# up. The cached mapping is used for all subsequent pages in the same writeback
+# cycle that cover the associated extent. Under certain conditions, it is
+# possible for concurrent operations on the file to invalidate the cached
+# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
+# partly stale mapping and potentially leaving delalloc blocks in the current
+# mapping unconverted.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_test_program "feature"
+
+_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
+_scratch_mount || _fail "mount failed"
+
+file=$SCRATCH_MNT/file
+filesize=$((1024 * 1024 * 32))
+pagesize=`src/feature -s`
+truncsize=$((filesize - pagesize))
+
+for i in $(seq 0 15); do
+ # Truncate the file and fsync to persist the final size on-disk. This is
+ # required so the subsequent truncate will not wait on writeback.
+ $XFS_IO_PROG -fc "truncate 0" $file
+ $XFS_IO_PROG -c "truncate $filesize" -c fsync $file
+
+ # create a small enough delalloc extent to likely be contiguous
+ $XFS_IO_PROG -c "pwrite 0 $filesize" $file >> $seqres.full 2>&1
+
+ # Start writeback and a racing truncate and rewrite of the final page.
+ $XFS_IO_PROG -c "sync_range -w 0 0" $file &
+ sync_pid=$!
+ $XFS_IO_PROG -c "truncate $truncsize" \
+ -c "pwrite $truncsize $pagesize" $file >> $seqres.full 2>&1
+
+ # If the test fails, the most likely outcome is an sb_fdblocks mismatch
+ # and/or an associated delalloc assert failure on inode reclaim. Cycle
+ # the mount to trigger detection.
+ wait $sync_pid
+ _scratch_cycle_mount || _fail "mount failed"
+done
+
+echo Silence is golden
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/999.out b/tests/generic/999.out
new file mode 100644
index 00000000..3b276ca8
--- /dev/null
+++ b/tests/generic/999.out
@@ -0,0 +1,2 @@
+QA output created by 999
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index ea5aa7aa..ce165981 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -525,3 +525,4 @@
520 auto quick log
521 soak long_rw
522 soak long_rw
+999 auto quick
--
2.17.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] tests/generic: test writepage cached mapping validity
2019-01-11 13:31 ` [PATCH] tests/generic: test writepage cached mapping validity Brian Foster
@ 2019-01-14 9:30 ` Eryu Guan
2019-01-14 15:34 ` Brian Foster
2019-01-15 3:52 ` Dave Chinner
0 siblings, 2 replies; 9+ messages in thread
From: Eryu Guan @ 2019-01-14 9:30 UTC (permalink / raw)
To: Brian Foster; +Cc: fstests, linux-xfs
On Fri, Jan 11, 2019 at 08:31:24AM -0500, Brian Foster wrote:
> XFS has a bug where page writeback can end up sending data to the
> wrong location due to a stale, cached file mapping. Add a test to
> trigger this problem by racing background writeback with a
> truncate/rewrite of the final page of the file.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>
> Hi all,
>
> This is a resend of an old post[1] that never quite made it upstream. It
> wasn't a big deal at the time because we didn't really have a proper fix
> for the problem. I'm resending now because there is a proposed fix[2].
Thanks for the resending!
>
> I've verified that this still reproduces the problem and no longer fails
> with the fix applied (in hundreds of iters). Note that reproduction may
> require many iterations. It took me anywhere from 5 to 30 or so on the
> box I tested, which I think is reasonable for the tradeoff of a fairly
> quick test. There was some discussion on the original post around making
> the test run longer for a more reliable reproducer, but I'm not sure how
> valuable that is given this is a targeted regression test. Thoughts
> appreciated.
It took me around 5 iterations to hit the corruption, I think it's fine.
But a couple of things changed over the years :)
>
> Brian
>
> [1] https://marc.info/?l=fstests&m=150902929900510&w=2
> [2] https://marc.info/?l=linux-xfs&m=154721212321112&w=2
>
> tests/generic/999 | 94 +++++++++++++++++++++++++++++++++++++++++++
> tests/generic/999.out | 2 +
> tests/generic/group | 1 +
> 3 files changed, 97 insertions(+)
> create mode 100755 tests/generic/999
> create mode 100644 tests/generic/999.out
>
> diff --git a/tests/generic/999 b/tests/generic/999
> new file mode 100755
> index 00000000..9e56a1e0
> --- /dev/null
> +++ b/tests/generic/999
> @@ -0,0 +1,94 @@
> +#! /bin/bash
> +# FS QA Test 999
> +#
> +# Test XFS page writeback code for races with the cached file mapping. XFS
> +# caches the file -> block mapping for a full extent once it is initially looked
> +# up. The cached mapping is used for all subsequent pages in the same writeback
> +# cycle that cover the associated extent. Under certain conditions, it is
> +# possible for concurrent operations on the file to invalidate the cached
> +# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
> +# partly stale mapping and potentially leaving delalloc blocks in the current
> +# mapping unconverted.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
^^^^ 2019?
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> +#-----------------------------------------------------------------------
And please change this to SPDX-License-Identifier.
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_test_program "feature"
_require_xfs_io_command "sync_range"
> +
> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> +_scratch_mount || _fail "mount failed"
_scratch_mount will _fail the test on failure now :)
> +
> +file=$SCRATCH_MNT/file
> +filesize=$((1024 * 1024 * 32))
> +pagesize=`src/feature -s`
> +truncsize=$((filesize - pagesize))
> +
> +for i in $(seq 0 15); do
> + # Truncate the file and fsync to persist the final size on-disk. This is
> + # required so the subsequent truncate will not wait on writeback.
> + $XFS_IO_PROG -fc "truncate 0" $file
> + $XFS_IO_PROG -c "truncate $filesize" -c fsync $file
> +
> + # create a small enough delalloc extent to likely be contiguous
> + $XFS_IO_PROG -c "pwrite 0 $filesize" $file >> $seqres.full 2>&1
> +
> + # Start writeback and a racing truncate and rewrite of the final page.
> + $XFS_IO_PROG -c "sync_range -w 0 0" $file &
> + sync_pid=$!
> + $XFS_IO_PROG -c "truncate $truncsize" \
> + -c "pwrite $truncsize $pagesize" $file >> $seqres.full 2>&1
> +
> + # If the test fails, the most likely outcome is an sb_fdblocks mismatch
> + # and/or an associated delalloc assert failure on inode reclaim. Cycle
> + # the mount to trigger detection.
> + wait $sync_pid
> + _scratch_cycle_mount || _fail "mount failed"
And _scratch_cycle_mount will exit the test on failure as well.
Thanks,
Eryu
> +done
> +
> +echo Silence is golden
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/999.out b/tests/generic/999.out
> new file mode 100644
> index 00000000..3b276ca8
> --- /dev/null
> +++ b/tests/generic/999.out
> @@ -0,0 +1,2 @@
> +QA output created by 999
> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index ea5aa7aa..ce165981 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -525,3 +525,4 @@
> 520 auto quick log
> 521 soak long_rw
> 522 soak long_rw
> +999 auto quick
> --
> 2.17.2
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] tests/generic: test writepage cached mapping validity
2019-01-14 9:30 ` Eryu Guan
@ 2019-01-14 15:34 ` Brian Foster
2019-01-15 3:52 ` Dave Chinner
1 sibling, 0 replies; 9+ messages in thread
From: Brian Foster @ 2019-01-14 15:34 UTC (permalink / raw)
To: Eryu Guan; +Cc: fstests, linux-xfs
On Mon, Jan 14, 2019 at 05:30:36PM +0800, Eryu Guan wrote:
> On Fri, Jan 11, 2019 at 08:31:24AM -0500, Brian Foster wrote:
> > XFS has a bug where page writeback can end up sending data to the
> > wrong location due to a stale, cached file mapping. Add a test to
> > trigger this problem by racing background writeback with a
> > truncate/rewrite of the final page of the file.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >
> > Hi all,
> >
> > This is a resend of an old post[1] that never quite made it upstream. It
> > wasn't a big deal at the time because we didn't really have a proper fix
> > for the problem. I'm resending now because there is a proposed fix[2].
>
> Thanks for the resending!
>
> >
> > I've verified that this still reproduces the problem and no longer fails
> > with the fix applied (in hundreds of iters). Note that reproduction may
> > require many iterations. It took me anywhere from 5 to 30 or so on the
> > box I tested, which I think is reasonable for the tradeoff of a fairly
> > quick test. There was some discussion on the original post around making
> > the test run longer for a more reliable reproducer, but I'm not sure how
> > valuable that is given this is a targeted regression test. Thoughts
> > appreciated.
>
> It took me around 5 iterations to hit the corruption, I think it's fine.
>
> But a couple of things changed over the years :)
>
Indeed, these changes all sound good. I'll include them in v2, thanks!
Brian
> >
> > Brian
> >
> > [1] https://marc.info/?l=fstests&m=150902929900510&w=2
> > [2] https://marc.info/?l=linux-xfs&m=154721212321112&w=2
> >
> > tests/generic/999 | 94 +++++++++++++++++++++++++++++++++++++++++++
> > tests/generic/999.out | 2 +
> > tests/generic/group | 1 +
> > 3 files changed, 97 insertions(+)
> > create mode 100755 tests/generic/999
> > create mode 100644 tests/generic/999.out
> >
> > diff --git a/tests/generic/999 b/tests/generic/999
> > new file mode 100755
> > index 00000000..9e56a1e0
> > --- /dev/null
> > +++ b/tests/generic/999
> > @@ -0,0 +1,94 @@
> > +#! /bin/bash
> > +# FS QA Test 999
> > +#
> > +# Test XFS page writeback code for races with the cached file mapping. XFS
> > +# caches the file -> block mapping for a full extent once it is initially looked
> > +# up. The cached mapping is used for all subsequent pages in the same writeback
> > +# cycle that cover the associated extent. Under certain conditions, it is
> > +# possible for concurrent operations on the file to invalidate the cached
> > +# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
> > +# partly stale mapping and potentially leaving delalloc blocks in the current
> > +# mapping unconverted.
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
> ^^^^ 2019?
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> > +#-----------------------------------------------------------------------
>
> And please change this to SPDX-License-Identifier.
>
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1 # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > + cd /
> > + rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +
> > +# remove previous $seqres.full before test
> > +rm -f $seqres.full
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> > +_supported_os Linux
> > +_require_scratch
> > +_require_test_program "feature"
>
> _require_xfs_io_command "sync_range"
>
> > +
> > +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> > +_scratch_mount || _fail "mount failed"
>
> _scratch_mount will _fail the test on failure now :)
>
> > +
> > +file=$SCRATCH_MNT/file
> > +filesize=$((1024 * 1024 * 32))
> > +pagesize=`src/feature -s`
> > +truncsize=$((filesize - pagesize))
> > +
> > +for i in $(seq 0 15); do
> > + # Truncate the file and fsync to persist the final size on-disk. This is
> > + # required so the subsequent truncate will not wait on writeback.
> > + $XFS_IO_PROG -fc "truncate 0" $file
> > + $XFS_IO_PROG -c "truncate $filesize" -c fsync $file
> > +
> > + # create a small enough delalloc extent to likely be contiguous
> > + $XFS_IO_PROG -c "pwrite 0 $filesize" $file >> $seqres.full 2>&1
> > +
> > + # Start writeback and a racing truncate and rewrite of the final page.
> > + $XFS_IO_PROG -c "sync_range -w 0 0" $file &
> > + sync_pid=$!
> > + $XFS_IO_PROG -c "truncate $truncsize" \
> > + -c "pwrite $truncsize $pagesize" $file >> $seqres.full 2>&1
> > +
> > + # If the test fails, the most likely outcome is an sb_fdblocks mismatch
> > + # and/or an associated delalloc assert failure on inode reclaim. Cycle
> > + # the mount to trigger detection.
> > + wait $sync_pid
> > + _scratch_cycle_mount || _fail "mount failed"
>
> And _scratch_cycle_mount will exit the test on failure as well.
>
> Thanks,
> Eryu
>
> > +done
> > +
> > +echo Silence is golden
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/999.out b/tests/generic/999.out
> > new file mode 100644
> > index 00000000..3b276ca8
> > --- /dev/null
> > +++ b/tests/generic/999.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 999
> > +Silence is golden
> > diff --git a/tests/generic/group b/tests/generic/group
> > index ea5aa7aa..ce165981 100644
> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -525,3 +525,4 @@
> > 520 auto quick log
> > 521 soak long_rw
> > 522 soak long_rw
> > +999 auto quick
> > --
> > 2.17.2
> >
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] tests/generic: test writepage cached mapping validity
2019-01-14 9:30 ` Eryu Guan
2019-01-14 15:34 ` Brian Foster
@ 2019-01-15 3:52 ` Dave Chinner
1 sibling, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2019-01-15 3:52 UTC (permalink / raw)
To: Eryu Guan; +Cc: Brian Foster, fstests, linux-xfs
On Mon, Jan 14, 2019 at 05:30:36PM +0800, Eryu Guan wrote:
> On Fri, Jan 11, 2019 at 08:31:24AM -0500, Brian Foster wrote:
> > @@ -0,0 +1,94 @@
> > +#! /bin/bash
> > +# FS QA Test 999
> > +#
> > +# Test XFS page writeback code for races with the cached file mapping. XFS
> > +# caches the file -> block mapping for a full extent once it is initially looked
> > +# up. The cached mapping is used for all subsequent pages in the same writeback
> > +# cycle that cover the associated extent. Under certain conditions, it is
> > +# possible for concurrent operations on the file to invalidate the cached
> > +# mapping without the knowledge of writeback. Writeback ends up sending I/O to a
> > +# partly stale mapping and potentially leaving delalloc blocks in the current
> > +# mapping unconverted.
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
> ^^^^ 2019?
i.e. copyright is from when it was first posted if the current
posting is dervied from the original posting. If significant
alterations are made then a date update can occur. but the original
date should be preserved. Can be shorten down to 2017-2019 for a
contiguous span of years...
So the correct form here is probably:
# Copyright (c) 2017, 2019 Red Hat, Inc. All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> > +#-----------------------------------------------------------------------
>
> And please change this to SPDX-License-Identifier.
*nod* :)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-01-15 3:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-26 14:48 [PATCH] tests/generic: test writepage cached mapping validity Brian Foster
2017-10-26 15:34 ` Eryu Guan
2017-10-26 16:12 ` Brian Foster
2017-10-26 16:40 ` Eryu Guan
2017-10-26 17:17 ` Brian Foster
-- strict thread matches above, loose matches on Subject: below --
2019-01-11 12:30 [PATCH 0/4] xfs: properly invalidate cached writeback mapping Brian Foster
2019-01-11 13:31 ` [PATCH] tests/generic: test writepage cached mapping validity Brian Foster
2019-01-14 9:30 ` Eryu Guan
2019-01-14 15:34 ` Brian Foster
2019-01-15 3:52 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox