* [PATCH V2] xfstests: Test filesystem lockup on full overprovisioned dm-thin
@ 2017-07-13 13:12 Carlos Maiolino
2017-07-13 15:42 ` Brian Foster
0 siblings, 1 reply; 4+ messages in thread
From: Carlos Maiolino @ 2017-07-13 13:12 UTC (permalink / raw)
To: fstests
With thin devices, it's possible to have a virtual device larger than
the physical device itself, and such situation can cause problems to
filesystems, once the filesystem 'believe' to have more space than it
actually has.
This can lead the filesystem to several weird behaviors. The one tested
here is filesystem lockup.
In case of XFS, it locks up when trying to writeback AIL metadata back
to the filesystem, but, once there is no physical space available, XFS
locks up and do not gracefuly handle this case.
Other filesystems usually are remounted as read-only, so they already
have this situation covered.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
---
Changelog:
V2: - Add extra comments regarding the usage of snapshots
- more extra comments regarding why consistency check isn't
required
- Remove -n argument from `wait`
tests/generic/447 | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/447.out | 2 +
tests/generic/group | 1 +
3 files changed, 154 insertions(+)
create mode 100755 tests/generic/447
create mode 100644 tests/generic/447.out
diff --git a/tests/generic/447 b/tests/generic/447
new file mode 100755
index 0000000..45c4af8
--- /dev/null
+++ b/tests/generic/447
@@ -0,0 +1,151 @@
+#! /bin/bash
+# FS QA Test 424
+#
+# Test buffer filesystem error recovery during a full overcommited dm-thin device.
+#
+# When a dm-thin device reaches its full capacity, but the virtual device still
+# shows available space, the filesystem should be able to handle such cases
+# failing its operation without locking up.
+#
+# This test has been created first to cover a XFS problem where it loops
+# indefinitely in xfsaild due items still in AIL. The buffers containing such
+# items couldn't be resubmitted because the items were flush locked.
+# But, once this doesn't require any special filesystem feature to be executed,
+# this has been integrated as a generic test.
+#
+# This test might hang the filesystem when ran on an unpatched kernel
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+ $UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1
+ $LVM_PROG vgremove -ff $vgname >>$seqres.full 2>&1
+ $LVM_PROG pvremove -ff $SCRATCH_DEV >>$seqres.full 2>&1
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+
+# real QA test starts here
+
+# This tests for filesystem lockup not consistency, so don't check for fs
+# consistency after test
+_supported_fs generic
+_supported_os Linux
+_require_scratch_nocheck
+_require_dm_target thin-pool
+_require_command $LVM_PROG lvm
+_require_freeze
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+vgname=vg_$seq
+lvname=lv_$seq
+poolname=pool_$seq
+snapname=snap_$seq
+origpsize=100
+virtsize=200
+newpsize=200
+
+# Ensure we have enough disk space
+_scratch_mkfs_sized $((250 * 1024 * 1024)) >>$seqres.full 2>&1
+
+# Create a 100MB dm-thin POOL
+$LVM_PROG pvcreate -f $SCRATCH_DEV >>$seqres.full 2>&1
+$LVM_PROG vgcreate -f $vgname $SCRATCH_DEV >>$seqres.full 2>&1
+
+$LVM_PROG lvcreate --thinpool $poolname --errorwhenfull y \
+ --zero n -L $origpsize \
+ --poolmetadatasize 4M $vgname >>$seqres.full 2>&1
+
+# Create a overprovisioned 200MB dm-thin virt. device
+$LVM_PROG lvcreate --virtualsize $virtsize \
+ -T $vgname/$poolname \
+ -n $lvname >>$seqres.full 2>&1
+
+_mkfs_dev /dev/mapper/$vgname-$lvname >>$seqres.full 2>&1
+
+
+# Running the test over the original volume doesn't reproduce the problem
+# reliably, while, running it over a snapshot, makes the problem 100%
+# reproducible, so, create a snapshot and run the test over it.
+$LVM_PROG lvcreate -k n -s $vgname/$lvname \
+ -n $snapname >>$seqres.full 2>&1
+
+_mount /dev/mapper/$vgname-$snapname $SCRATCH_MNT
+
+# Consume all space available in the volume and freeze to ensure everything
+# required to make the fs consistent is flushed to disk.
+$XFS_IO_PROG -f -d -c 'pwrite -b 1m 0 120m' $SCRATCH_MNT/f1 >>$seqres.full 2>&1
+
+# In XFS, this freeze will never complete until the dm-thin POOL device is
+# extended. It is expected, and is only used so xfsaild is triggered to
+# flush AIL items, other filesystems usually get remounted as read-only during
+# the above write process.
+fsfreeze -f $SCRATCH_MNT >>$seqres.full 2>&1 &
+freezeid=$!
+
+# Wait enough so xfsaild can run
+sleep 10
+
+# Make some extra space available so the freeze above can proceed
+$LVM_PROG lvextend -L $newpsize $vgname/$poolname >>$seqres.full 2>&1
+
+wait $freezeid
+ret=$?
+
+
+# Different filesystems will handle the lack of real space in different ways,
+# some will remount the filesystem in read-only mode, some will not. These tests
+# will check if:
+# - The filesystem turns into Read-Only and reject further writes
+# - The filesystem stays in Read-Write mode, but can be frozen/thawed
+# without getting stuck.
+ISRO=$(_fs_options /dev/mapper/$vgname-$snapname | grep -w "ro")
+
+if [ $ret -ne 0 ]; then
+ if [ -n "$ISRO" ]; then
+ echo "Test OK"
+ else
+ echo "Freeze failed and FS isn't Read-Only. Test Failed"
+ status=1
+ exit
+ fi
+else
+ # Try to thaw the filesystem, and complete test if if succeed.
+ # NOTE: This will hang on affected XFS filesystems.
+ fsfreeze -u $SCRATCH_MNT >>$seqres.full 2>&1
+ echo "Test OK"
+fi
+
+status=0
+exit
diff --git a/tests/generic/447.out b/tests/generic/447.out
new file mode 100644
index 0000000..3531fb9
--- /dev/null
+++ b/tests/generic/447.out
@@ -0,0 +1,2 @@
+QA output created by 447
+Test OK
diff --git a/tests/generic/group b/tests/generic/group
index 8c1e21a..6a360fa 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -449,3 +449,4 @@
444 auto quick acl
445 auto quick rw
446 auto quick rw dangerous
+447 auto dangerous
--
2.9.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH V2] xfstests: Test filesystem lockup on full overprovisioned dm-thin
2017-07-13 13:12 [PATCH V2] xfstests: Test filesystem lockup on full overprovisioned dm-thin Carlos Maiolino
@ 2017-07-13 15:42 ` Brian Foster
2017-07-14 3:46 ` Eryu Guan
0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2017-07-13 15:42 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: fstests
On Thu, Jul 13, 2017 at 03:12:08PM +0200, Carlos Maiolino wrote:
> With thin devices, it's possible to have a virtual device larger than
> the physical device itself, and such situation can cause problems to
> filesystems, once the filesystem 'believe' to have more space than it
> actually has.
>
> This can lead the filesystem to several weird behaviors. The one tested
> here is filesystem lockup.
>
> In case of XFS, it locks up when trying to writeback AIL metadata back
> to the filesystem, but, once there is no physical space available, XFS
> locks up and do not gracefuly handle this case.
>
> Other filesystems usually are remounted as read-only, so they already
> have this situation covered.
>
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
Looks good to me. Any lingering concerns from Eryu aside:
Reviewed-by: Brian Foster <bfoster@redhat.com>
Thanks for the fixups.
Brian
> Changelog:
>
> V2: - Add extra comments regarding the usage of snapshots
> - more extra comments regarding why consistency check isn't
> required
> - Remove -n argument from `wait`
>
> tests/generic/447 | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++
> tests/generic/447.out | 2 +
> tests/generic/group | 1 +
> 3 files changed, 154 insertions(+)
> create mode 100755 tests/generic/447
> create mode 100644 tests/generic/447.out
>
> diff --git a/tests/generic/447 b/tests/generic/447
> new file mode 100755
> index 0000000..45c4af8
> --- /dev/null
> +++ b/tests/generic/447
> @@ -0,0 +1,151 @@
> +#! /bin/bash
> +# FS QA Test 424
> +#
> +# Test buffer filesystem error recovery during a full overcommited dm-thin device.
> +#
> +# When a dm-thin device reaches its full capacity, but the virtual device still
> +# shows available space, the filesystem should be able to handle such cases
> +# failing its operation without locking up.
> +#
> +# This test has been created first to cover a XFS problem where it loops
> +# indefinitely in xfsaild due items still in AIL. The buffers containing such
> +# items couldn't be resubmitted because the items were flush locked.
> +# But, once this doesn't require any special filesystem feature to be executed,
> +# this has been integrated as a generic test.
> +#
> +# This test might hang the filesystem when ran on an unpatched kernel
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> + $UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1
> + $LVM_PROG vgremove -ff $vgname >>$seqres.full 2>&1
> + $LVM_PROG pvremove -ff $SCRATCH_DEV >>$seqres.full 2>&1
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +
> +# real QA test starts here
> +
> +# This tests for filesystem lockup not consistency, so don't check for fs
> +# consistency after test
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch_nocheck
> +_require_dm_target thin-pool
> +_require_command $LVM_PROG lvm
> +_require_freeze
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +vgname=vg_$seq
> +lvname=lv_$seq
> +poolname=pool_$seq
> +snapname=snap_$seq
> +origpsize=100
> +virtsize=200
> +newpsize=200
> +
> +# Ensure we have enough disk space
> +_scratch_mkfs_sized $((250 * 1024 * 1024)) >>$seqres.full 2>&1
> +
> +# Create a 100MB dm-thin POOL
> +$LVM_PROG pvcreate -f $SCRATCH_DEV >>$seqres.full 2>&1
> +$LVM_PROG vgcreate -f $vgname $SCRATCH_DEV >>$seqres.full 2>&1
> +
> +$LVM_PROG lvcreate --thinpool $poolname --errorwhenfull y \
> + --zero n -L $origpsize \
> + --poolmetadatasize 4M $vgname >>$seqres.full 2>&1
> +
> +# Create a overprovisioned 200MB dm-thin virt. device
> +$LVM_PROG lvcreate --virtualsize $virtsize \
> + -T $vgname/$poolname \
> + -n $lvname >>$seqres.full 2>&1
> +
> +_mkfs_dev /dev/mapper/$vgname-$lvname >>$seqres.full 2>&1
> +
> +
> +# Running the test over the original volume doesn't reproduce the problem
> +# reliably, while, running it over a snapshot, makes the problem 100%
> +# reproducible, so, create a snapshot and run the test over it.
> +$LVM_PROG lvcreate -k n -s $vgname/$lvname \
> + -n $snapname >>$seqres.full 2>&1
> +
> +_mount /dev/mapper/$vgname-$snapname $SCRATCH_MNT
> +
> +# Consume all space available in the volume and freeze to ensure everything
> +# required to make the fs consistent is flushed to disk.
> +$XFS_IO_PROG -f -d -c 'pwrite -b 1m 0 120m' $SCRATCH_MNT/f1 >>$seqres.full 2>&1
> +
> +# In XFS, this freeze will never complete until the dm-thin POOL device is
> +# extended. It is expected, and is only used so xfsaild is triggered to
> +# flush AIL items, other filesystems usually get remounted as read-only during
> +# the above write process.
> +fsfreeze -f $SCRATCH_MNT >>$seqres.full 2>&1 &
> +freezeid=$!
> +
> +# Wait enough so xfsaild can run
> +sleep 10
> +
> +# Make some extra space available so the freeze above can proceed
> +$LVM_PROG lvextend -L $newpsize $vgname/$poolname >>$seqres.full 2>&1
> +
> +wait $freezeid
> +ret=$?
> +
> +
> +# Different filesystems will handle the lack of real space in different ways,
> +# some will remount the filesystem in read-only mode, some will not. These tests
> +# will check if:
> +# - The filesystem turns into Read-Only and reject further writes
> +# - The filesystem stays in Read-Write mode, but can be frozen/thawed
> +# without getting stuck.
> +ISRO=$(_fs_options /dev/mapper/$vgname-$snapname | grep -w "ro")
> +
> +if [ $ret -ne 0 ]; then
> + if [ -n "$ISRO" ]; then
> + echo "Test OK"
> + else
> + echo "Freeze failed and FS isn't Read-Only. Test Failed"
> + status=1
> + exit
> + fi
> +else
> + # Try to thaw the filesystem, and complete test if if succeed.
> + # NOTE: This will hang on affected XFS filesystems.
> + fsfreeze -u $SCRATCH_MNT >>$seqres.full 2>&1
> + echo "Test OK"
> +fi
> +
> +status=0
> +exit
> diff --git a/tests/generic/447.out b/tests/generic/447.out
> new file mode 100644
> index 0000000..3531fb9
> --- /dev/null
> +++ b/tests/generic/447.out
> @@ -0,0 +1,2 @@
> +QA output created by 447
> +Test OK
> diff --git a/tests/generic/group b/tests/generic/group
> index 8c1e21a..6a360fa 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -449,3 +449,4 @@
> 444 auto quick acl
> 445 auto quick rw
> 446 auto quick rw dangerous
> +447 auto dangerous
> --
> 2.9.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH V2] xfstests: Test filesystem lockup on full overprovisioned dm-thin
2017-07-13 15:42 ` Brian Foster
@ 2017-07-14 3:46 ` Eryu Guan
2017-07-17 10:51 ` Carlos Maiolino
0 siblings, 1 reply; 4+ messages in thread
From: Eryu Guan @ 2017-07-14 3:46 UTC (permalink / raw)
To: Brian Foster; +Cc: Carlos Maiolino, fstests
On Thu, Jul 13, 2017 at 11:42:59AM -0400, Brian Foster wrote:
> On Thu, Jul 13, 2017 at 03:12:08PM +0200, Carlos Maiolino wrote:
> > With thin devices, it's possible to have a virtual device larger than
> > the physical device itself, and such situation can cause problems to
> > filesystems, once the filesystem 'believe' to have more space than it
> > actually has.
> >
> > This can lead the filesystem to several weird behaviors. The one tested
> > here is filesystem lockup.
> >
> > In case of XFS, it locks up when trying to writeback AIL metadata back
> > to the filesystem, but, once there is no physical space available, XFS
> > locks up and do not gracefuly handle this case.
> >
> > Other filesystems usually are remounted as read-only, so they already
> > have this situation covered.
> >
> > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Thanks for the update!
> > ---
>
> Looks good to me. Any lingering concerns from Eryu aside:
>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
Looks good to me too, thanks for the review!
Eryu
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH V2] xfstests: Test filesystem lockup on full overprovisioned dm-thin
2017-07-14 3:46 ` Eryu Guan
@ 2017-07-17 10:51 ` Carlos Maiolino
0 siblings, 0 replies; 4+ messages in thread
From: Carlos Maiolino @ 2017-07-17 10:51 UTC (permalink / raw)
To: Eryu Guan; +Cc: Brian Foster, fstests
On Fri, Jul 14, 2017 at 11:46:40AM +0800, Eryu Guan wrote:
> On Thu, Jul 13, 2017 at 11:42:59AM -0400, Brian Foster wrote:
> > On Thu, Jul 13, 2017 at 03:12:08PM +0200, Carlos Maiolino wrote:
> > > With thin devices, it's possible to have a virtual device larger than
> > > the physical device itself, and such situation can cause problems to
> > > filesystems, once the filesystem 'believe' to have more space than it
> > > actually has.
> > >
> > > This can lead the filesystem to several weird behaviors. The one tested
> > > here is filesystem lockup.
> > >
> > > In case of XFS, it locks up when trying to writeback AIL metadata back
> > > to the filesystem, but, once there is no physical space available, XFS
> > > locks up and do not gracefuly handle this case.
> > >
> > > Other filesystems usually are remounted as read-only, so they already
> > > have this situation covered.
> > >
> > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
>
Somehow these messages were marked as read on my MUA.
Thanks for the review folks.
Time to finish the kernel fix for this test
Cheers
--
Carlos
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-07-17 10:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-13 13:12 [PATCH V2] xfstests: Test filesystem lockup on full overprovisioned dm-thin Carlos Maiolino
2017-07-13 15:42 ` Brian Foster
2017-07-14 3:46 ` Eryu Guan
2017-07-17 10:51 ` Carlos Maiolino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox