linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim
@ 2014-11-26 15:30 Filipe Manana
  2014-12-01  5:25 ` Dave Chinner
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Filipe Manana @ 2014-11-26 15:30 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, Filipe Manana

Stress btrfs' block group allocation and deallocation while running
fstrim in parallel. Part of the goal is also to get data block groups
deallocated so that new metadata block groups, using the same physical
device space ranges, get allocated while fstrim is running. This caused
several issues ranging from invalid memory accesses, kernel crashes,
metadata or data corruption, free space cache inconsistencies and free
space leaks.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 tests/btrfs/082     | 148 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/082.out |   2 +
 tests/btrfs/group   |   1 +
 3 files changed, 151 insertions(+)
 create mode 100755 tests/btrfs/082
 create mode 100644 tests/btrfs/082.out

diff --git a/tests/btrfs/082 b/tests/btrfs/082
new file mode 100755
index 0000000..8ac9f06
--- /dev/null
+++ b/tests/btrfs/082
@@ -0,0 +1,148 @@
+#! /bin/bash
+# FSQA Test No. 082
+#
+# Stress btrfs' block group allocation and deallocation while running fstrim in
+# parallel. Part of the goal is also to get data block groups deallocated so
+# that new metadata block groups, using the same physical device space ranges,
+# get allocated while fstrim is running. This caused several issues ranging
+# from invalid memory accesses, kernel crashes, metadata or data corruption,
+# free space cache inconsistencies and free space leaks.
+#
+# These issues were fixed by the following btrfs linux kernel patches:
+#
+#   Btrfs: fix invalid block group rbtree access after bg is removed
+#   Btrfs: fix crash caused by block group removal
+#   Btrfs: fix freeing used extents after removing empty block group
+#   Btrfs: fix race between fs trimming and block group remove/allocation
+#   Btrfs: fix race between writing free space cache and trimming
+#   Btrfs: make btrfs_abort_transaction consider existence of new block groups
+#
+# The issues were found on a qemu/kvm guest with 4 virtual CPUs, 4Gb of ram and
+# scsi-hd devices with discard support enabled (that means hole punching in the
+# disk's image file is performed by the host).
+#
+#-----------------------------------------------------------------------
+#
+# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana <fdmanana@suse.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch_nocheck
+_require_fstrim
+
+rm -f $seqres.full
+
+# Keep allocating and deallocating 2G of data space with the goal of creating
+# and deleting 2 block groups constantly. The intention is to race with the
+# fstrim loop below.
+fallocate_loop()
+{
+	local name=$1
+	while true; do
+		$XFS_IO_PROG -f -c "falloc -k 0 2G" \
+			$SCRATCH_MNT/$name &> /dev/null
+		sleep 3
+		$XFS_IO_PROG -c "truncate 0" \
+			$SCRATCH_MNT/$name &> /dev/null
+		sleep 3
+	done
+}
+
+trim_loop()
+{
+	while true; do
+		$FSTRIM_PROG $SCRATCH_MNT
+	done
+}
+
+# Create a bunch of small files that get their single extent inlined in the
+# btree, so that we consume a lot of metadata space and get a chance of a
+# data block group getting deleted and reused for metadata later. Sometimes
+# the creation of all these files succeeds other times we get ENOSPC failures
+# at some point - this depends on how fast the btrfs' cleaner kthread is
+# notified about empty block groups, how fast it deletes them and how fast
+# the fallocate calls happen. So we don't really care if they all succeed or
+# not, the goal is just to keep metadata space usage growing while data block
+# groups are deleted.
+create_files()
+{
+	local prefix=$1
+
+	for ((i = 1; i <= 400000; i++)); do
+		echo "Creating file ${prefix}_$i" >>$seqres.full 2>&1
+		$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 3900" \
+			$SCRATCH_MNT/"${prefix}_$i" >>$seqres.full 2>&1
+		ret=$?
+		if [ $ret -ne 0 ]; then
+			break
+		fi
+	done
+
+}
+
+fsz=`expr 40 \* 1024 \* 1024 \* 1024`
+_scratch_mkfs_sized $fsz >>$seqres.full 2>&1 || \
+	_fail "size=$fsz mkfs failed"
+_scratch_mount
+
+for ((i = 0; i < 4; i++)); do
+	trim_loop &
+	trim_pids[$i]=$!
+done
+
+fallocate_loop "falloc_file" &
+fallocate_pid=$!
+
+create_files "foobar"
+
+kill $fallocate_pid
+kill ${trim_pids[@]}
+wait
+
+# Sleep a bit, otherwise umount fails often with EBUSY (TODO: investigate why).
+sleep 3
+
+# Check for fs consistency. The trimming was racy and caused some btree nodes
+# to get full of zeroes on disk, which obviously caused fs metadata corruption.
+# The race often lead to missing free space entries in a block group's free
+# space cache too.
+_check_scratch_fs
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/btrfs/082.out b/tests/btrfs/082.out
new file mode 100644
index 0000000..2977f14
--- /dev/null
+++ b/tests/btrfs/082.out
@@ -0,0 +1,2 @@
+QA output created by 082
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index e79b848..6608005 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -84,3 +84,4 @@
 079 auto
 080 auto
 081 auto quick
+082 auto
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim
  2014-11-26 15:30 [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim Filipe Manana
@ 2014-12-01  5:25 ` Dave Chinner
  2014-12-01 17:11 ` Filipe Manana
  2014-12-02 18:10 ` [PATCH v3] " Filipe Manana
  2 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2014-12-01  5:25 UTC (permalink / raw)
  To: Filipe Manana; +Cc: fstests, linux-btrfs

On Wed, Nov 26, 2014 at 03:30:39PM +0000, Filipe Manana wrote:
> Stress btrfs' block group allocation and deallocation while running
> fstrim in parallel. Part of the goal is also to get data block groups
> deallocated so that new metadata block groups, using the same physical
> device space ranges, get allocated while fstrim is running. This caused
> several issues ranging from invalid memory accesses, kernel crashes,
> metadata or data corruption, free space cache inconsistencies and free
> space leaks.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

There's nothing btrfs specific about this test. Pleas emake it
generic.

....

> +
> +# real QA test starts here
> +_need_to_be_root
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch_nocheck
> +_require_fstrim
> +
> +rm -f $seqres.full

# needs 40GB of space in the filesystem
_scratch_mkfs
_require_fs_space $SCRATCH_MNT $((40 * 1024 * 1024))	

However, does it really need 40GB? It needs 2GB for the large alloc,
and then 400,000 * 4k is only 1.6GB. So This would fit in a 10GB
filesystem without a problem, right? And if it's a generic test,
keeping it under 10GB would mean it runs on the majority of
filesystem developers test VMs, small or large....


> +# Create a bunch of small files that get their single extent inlined in the
> +# btree, so that we consume a lot of metadata space and get a chance of a
> +# data block group getting deleted and reused for metadata later. Sometimes
> +# the creation of all these files succeeds other times we get ENOSPC failures
> +# at some point - this depends on how fast the btrfs' cleaner kthread is
> +# notified about empty block groups, how fast it deletes them and how fast
> +# the fallocate calls happen. So we don't really care if they all succeed or
> +# not, the goal is just to keep metadata space usage growing while data block
> +# groups are deleted.
> +create_files()
> +{
> +	local prefix=$1
> +
> +	for ((i = 1; i <= 400000; i++)); do
> +		echo "Creating file ${prefix}_$i" >>$seqres.full 2>&1
> +		$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 3900" \
> +			$SCRATCH_MNT/"${prefix}_$i" >>$seqres.full 2>&1

You don't need to echo 400,000 file creates to $seqres.full.

This is one of those times that directing output to /dev/null makes
sense, especially as:

> +		ret=$?
> +		if [ $ret -ne 0 ]; then
> +			break
> +		fi

you can do this:

		if [ $? -ne 0 ]; then
			echo "failed creating file $prefix.$i" >> $seqres.full
			break
		fi

> +	done
> +
> +}
> +
> +fsz=`expr 40 \* 1024 \* 1024 \* 1024`
> +_scratch_mkfs_sized $fsz >>$seqres.full 2>&1 || \
> +	_fail "size=$fsz mkfs failed"
> +_scratch_mount
> +
> +for ((i = 0; i < 4; i++)); do
> +	trim_loop &
> +	trim_pids[$i]=$!
> +done
> +
> +fallocate_loop "falloc_file" &
> +fallocate_pid=$!
> +
> +create_files "foobar"
> +
> +kill $fallocate_pid
> +kill ${trim_pids[@]}
> +wait
> +
> +# Sleep a bit, otherwise umount fails often with EBUSY (TODO: investigate why).
> +sleep 3
> +
> +# Check for fs consistency. The trimming was racy and caused some btree nodes
> +# to get full of zeroes on disk, which obviously caused fs metadata corruption.
> +# The race often lead to missing free space entries in a block group's free
> +# space cache too.
> +_check_scratch_fs

Ummm, if you just use _require_scratch, you don't need to do this.
The test harness will check it for you.

> index e79b848..6608005 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -84,3 +84,4 @@
>  079 auto
>  080 auto
>  081 auto quick
> +082 auto

I'd suggest that for a generic test we'd want to add the stress
group to this, and allow the test to be scaled in terms of
filesystem size and the number of concurrent trim and fallocate
loops by $LOAD_FACTOR....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim
  2014-11-26 15:30 [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim Filipe Manana
  2014-12-01  5:25 ` Dave Chinner
@ 2014-12-01 17:11 ` Filipe Manana
  2014-12-02  3:06   ` Eryu Guan
  2014-12-02 18:10 ` [PATCH v3] " Filipe Manana
  2 siblings, 1 reply; 5+ messages in thread
From: Filipe Manana @ 2014-12-01 17:11 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, Filipe Manana

Stress btrfs' block group allocation and deallocation while running
fstrim in parallel. Part of the goal is also to get data block groups
deallocated so that new metadata block groups, using the same physical
device space ranges, get allocated while fstrim is running. This caused
several issues ranging from invalid memory accesses, kernel crashes,
metadata or data corruption, free space cache inconsistencies, free
space leaks and memory leaks.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Addressed Dave's comments.

 tests/generic/038     | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/038.out |   2 +
 tests/generic/group   |   1 +
 3 files changed, 155 insertions(+)
 create mode 100755 tests/generic/038
 create mode 100644 tests/generic/038.out

diff --git a/tests/generic/038 b/tests/generic/038
new file mode 100755
index 0000000..217aa7a
--- /dev/null
+++ b/tests/generic/038
@@ -0,0 +1,152 @@
+#! /bin/bash
+# FSQA Test No. 038
+#
+# This test was motivated by btrfs issues, but it's generic enough as it
+# doesn't use any btrfs specific features.
+#
+# Stress btrfs' block group allocation and deallocation while running fstrim in
+# parallel. Part of the goal is also to get data block groups deallocated so
+# that new metadata block groups, using the same physical device space ranges,
+# get allocated while fstrim is running. This caused several issues ranging
+# from invalid memory accesses, kernel crashes, metadata or data corruption,
+# free space cache inconsistencies, free space leaks and memory leaks.
+#
+# These issues were fixed by the following btrfs linux kernel patches:
+#
+#   Btrfs: fix invalid block group rbtree access after bg is removed
+#   Btrfs: fix crash caused by block group removal
+#   Btrfs: fix freeing used extents after removing empty block group
+#   Btrfs: fix race between fs trimming and block group remove/allocation
+#   Btrfs: fix race between writing free space cache and trimming
+#   Btrfs: make btrfs_abort_transaction consider existence of new block groups
+#   Btrfs: fix memory leak after block remove + trimming
+#   Btrfs: fix extent map leak on chunk allocation failure
+#
+# The issues were found on a qemu/kvm guest with 4 virtual CPUs, 4Gb of ram and
+# scsi-hd devices with discard support enabled (that means hole punching in the
+# disk's image file is performed by the host).
+#
+#-----------------------------------------------------------------------
+#
+# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana <fdmanana@suse.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_fstrim
+
+rm -f $seqres.full
+
+# Keep allocating and deallocating 1G of data space with the goal of creating
+# and deleting 1 block group constantly. The intention is to race with the
+# fstrim loop below.
+fallocate_loop()
+{
+	local name=$1
+	while true; do
+		$XFS_IO_PROG -f -c "falloc -k 0 1G" \
+			$SCRATCH_MNT/$name &> /dev/null
+		sleep 3
+		$XFS_IO_PROG -c "truncate 0" \
+			$SCRATCH_MNT/$name &> /dev/null
+		sleep 3
+	done
+}
+
+trim_loop()
+{
+	while true; do
+		$FSTRIM_PROG $SCRATCH_MNT
+	done
+}
+
+# Create a bunch of small files that get their single extent inlined in the
+# btree, so that we consume a lot of metadata space and get a chance of a
+# data block group getting deleted and reused for metadata later. Sometimes
+# the creation of all these files succeeds other times we get ENOSPC failures
+# at some point - this depends on how fast the btrfs' cleaner kthread is
+# notified about empty block groups, how fast it deletes them and how fast
+# the fallocate calls happen. So we don't really care if they all succeed or
+# not, the goal is just to keep metadata space usage growing while data block
+# groups are deleted.
+create_files()
+{
+	local prefix=$1
+
+	for ((i = 1; i <= 400000; i++)); do
+		$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 3900" \
+			$SCRATCH_MNT/"${prefix}_$i" &> /dev/null
+		if [ $? -ne 0 ]; then
+			echo "Failed creating file ${prefix}_$i" >>$seqres.full
+			break
+		fi
+	done
+
+}
+
+_scratch_mkfs >>$seqres.full 2>&1
+_require_fs_space $SCRATCH_MNT $((10 * 1024 * 1024))
+_scratch_mount
+
+for ((i = 0; i < $((4 * $LOAD_FACTOR)); i++)); do
+	trim_loop &
+	trim_pids[$i]=$!
+done
+
+for ((i = 0; i < $((1 * $LOAD_FACTOR)); i++)); do
+	fallocate_loop "falloc_file_$i" &
+	fallocate_pids[$i]=$!
+done
+
+create_files "foobar"
+
+kill ${fallocate_pids[@]}
+kill ${trim_pids[@]}
+wait
+
+# Sleep a bit, otherwise umount fails often with EBUSY (TODO: investigate why).
+sleep 3
+
+# The fstests framework will now check for fs consistency with fsck.
+# The trimming was racy and caused some btree nodes to get full of zeroes on
+# disk, which obviously caused fs metadata corruption. The race often lead
+# to missing free space entries in a block group's free space cache too.
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/generic/038.out b/tests/generic/038.out
new file mode 100644
index 0000000..5e0f13e
--- /dev/null
+++ b/tests/generic/038.out
@@ -0,0 +1,2 @@
+QA output created by 038
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 9f355fc..1e89848 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -40,6 +40,7 @@
 035 auto quick
 036 auto aio rw stress
 037 metadata auto quick
+038 auto stress
 053 acl repair auto quick
 062 attr udf auto quick
 068 other auto freeze dangerous stress
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim
  2014-12-01 17:11 ` Filipe Manana
@ 2014-12-02  3:06   ` Eryu Guan
  0 siblings, 0 replies; 5+ messages in thread
From: Eryu Guan @ 2014-12-02  3:06 UTC (permalink / raw)
  To: Filipe Manana; +Cc: fstests, linux-btrfs

On Mon, Dec 01, 2014 at 05:11:29PM +0000, Filipe Manana wrote:
> Stress btrfs' block group allocation and deallocation while running
> fstrim in parallel. Part of the goal is also to get data block groups
> deallocated so that new metadata block groups, using the same physical
> device space ranges, get allocated while fstrim is running. This caused
> several issues ranging from invalid memory accesses, kernel crashes,
> metadata or data corruption, free space cache inconsistencies, free
> space leaks and memory leaks.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> V2: Addressed Dave's comments.
> 
>  tests/generic/038     | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/038.out |   2 +
>  tests/generic/group   |   1 +
>  3 files changed, 155 insertions(+)
>  create mode 100755 tests/generic/038
>  create mode 100644 tests/generic/038.out
> 
> diff --git a/tests/generic/038 b/tests/generic/038
> new file mode 100755
> index 0000000..217aa7a
> --- /dev/null
> +++ b/tests/generic/038
> @@ -0,0 +1,152 @@
> +#! /bin/bash
> +# FSQA Test No. 038
> +#
> +# This test was motivated by btrfs issues, but it's generic enough as it
> +# doesn't use any btrfs specific features.
> +#
> +# Stress btrfs' block group allocation and deallocation while running fstrim in
> +# parallel. Part of the goal is also to get data block groups deallocated so
> +# that new metadata block groups, using the same physical device space ranges,
> +# get allocated while fstrim is running. This caused several issues ranging
> +# from invalid memory accesses, kernel crashes, metadata or data corruption,
> +# free space cache inconsistencies, free space leaks and memory leaks.
> +#
> +# These issues were fixed by the following btrfs linux kernel patches:
> +#
> +#   Btrfs: fix invalid block group rbtree access after bg is removed
> +#   Btrfs: fix crash caused by block group removal
> +#   Btrfs: fix freeing used extents after removing empty block group
> +#   Btrfs: fix race between fs trimming and block group remove/allocation
> +#   Btrfs: fix race between writing free space cache and trimming
> +#   Btrfs: make btrfs_abort_transaction consider existence of new block groups
> +#   Btrfs: fix memory leak after block remove + trimming
> +#   Btrfs: fix extent map leak on chunk allocation failure
> +#
> +# The issues were found on a qemu/kvm guest with 4 virtual CPUs, 4Gb of ram and
> +# scsi-hd devices with discard support enabled (that means hole punching in the
> +# disk's image file is performed by the host).
> +#
> +#-----------------------------------------------------------------------
> +#
> +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
> +# Author: Filipe Manana <fdmanana@suse.com>
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	rm -fr $tmp
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +_need_to_be_root
> +_supported_fs btrfs

This should be "_supported_fs generic"

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3] fstests: add btrfs test to stress chunk allocation/removal and fstrim
  2014-11-26 15:30 [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim Filipe Manana
  2014-12-01  5:25 ` Dave Chinner
  2014-12-01 17:11 ` Filipe Manana
@ 2014-12-02 18:10 ` Filipe Manana
  2 siblings, 0 replies; 5+ messages in thread
From: Filipe Manana @ 2014-12-02 18:10 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, Filipe Manana

Stress btrfs' block group allocation and deallocation while running
fstrim in parallel. Part of the goal is also to get data block groups
deallocated so that new metadata block groups, using the same physical
device space ranges, get allocated while fstrim is running. This caused
several issues ranging from invalid memory accesses, kernel crashes,
metadata or data corruption, free space cache inconsistencies, free
space leaks and memory leaks.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Addressed Dave's comments.

V3: Missing s/_supported_fs btrfs/_supported_fs generic/
    Thanks Eryu.

 tests/generic/038     | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/038.out |   2 +
 tests/generic/group   |   1 +
 3 files changed, 156 insertions(+)
 create mode 100755 tests/generic/038
 create mode 100644 tests/generic/038.out

diff --git a/tests/generic/038 b/tests/generic/038
new file mode 100755
index 0000000..5db718c
--- /dev/null
+++ b/tests/generic/038
@@ -0,0 +1,153 @@
+#! /bin/bash
+# FSQA Test No. 038
+#
+# This test was motivated by btrfs issues, but it's generic enough as it
+# doesn't use any btrfs specific features.
+#
+# Stress btrfs' block group allocation and deallocation while running fstrim in
+# parallel. Part of the goal is also to get data block groups deallocated so
+# that new metadata block groups, using the same physical device space ranges,
+# get allocated while fstrim is running. This caused several issues ranging
+# from invalid memory accesses, kernel crashes, metadata or data corruption,
+# free space cache inconsistencies, free space leaks and memory leaks.
+#
+# These issues were fixed by the following btrfs linux kernel patches:
+#
+#   Btrfs: fix invalid block group rbtree access after bg is removed
+#   Btrfs: fix crash caused by block group removal
+#   Btrfs: fix freeing used extents after removing empty block group
+#   Btrfs: fix race between fs trimming and block group remove/allocation
+#   Btrfs: fix race between writing free space cache and trimming
+#   Btrfs: make btrfs_abort_transaction consider existence of new block groups
+#   Btrfs: fix memory leak after block remove + trimming
+#   Btrfs: fix fs mapping extent map leak
+#   Btrfs: fix unprotected deletion from pending_chunks list
+#
+# The issues were found on a qemu/kvm guest with 4 virtual CPUs, 4Gb of ram and
+# scsi-hd devices with discard support enabled (that means hole punching in the
+# disk's image file is performed by the host).
+#
+#-----------------------------------------------------------------------
+#
+# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana <fdmanana@suse.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_fstrim
+
+rm -f $seqres.full
+
+# Keep allocating and deallocating 1G of data space with the goal of creating
+# and deleting 1 block group constantly. The intention is to race with the
+# fstrim loop below.
+fallocate_loop()
+{
+	local name=$1
+	while true; do
+		$XFS_IO_PROG -f -c "falloc -k 0 1G" \
+			$SCRATCH_MNT/$name &> /dev/null
+		sleep 3
+		$XFS_IO_PROG -c "truncate 0" \
+			$SCRATCH_MNT/$name &> /dev/null
+		sleep 3
+	done
+}
+
+trim_loop()
+{
+	while true; do
+		$FSTRIM_PROG $SCRATCH_MNT
+	done
+}
+
+# Create a bunch of small files that get their single extent inlined in the
+# btree, so that we consume a lot of metadata space and get a chance of a
+# data block group getting deleted and reused for metadata later. Sometimes
+# the creation of all these files succeeds other times we get ENOSPC failures
+# at some point - this depends on how fast the btrfs' cleaner kthread is
+# notified about empty block groups, how fast it deletes them and how fast
+# the fallocate calls happen. So we don't really care if they all succeed or
+# not, the goal is just to keep metadata space usage growing while data block
+# groups are deleted.
+create_files()
+{
+	local prefix=$1
+
+	for ((i = 1; i <= 400000; i++)); do
+		$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 3900" \
+			$SCRATCH_MNT/"${prefix}_$i" &> /dev/null
+		if [ $? -ne 0 ]; then
+			echo "Failed creating file ${prefix}_$i" >>$seqres.full
+			break
+		fi
+	done
+
+}
+
+_scratch_mkfs >>$seqres.full 2>&1
+_require_fs_space $SCRATCH_MNT $((10 * 1024 * 1024))
+_scratch_mount
+
+for ((i = 0; i < $((4 * $LOAD_FACTOR)); i++)); do
+	trim_loop &
+	trim_pids[$i]=$!
+done
+
+for ((i = 0; i < $((1 * $LOAD_FACTOR)); i++)); do
+	fallocate_loop "falloc_file_$i" &
+	fallocate_pids[$i]=$!
+done
+
+create_files "foobar"
+
+kill ${fallocate_pids[@]}
+kill ${trim_pids[@]}
+wait
+
+# Sleep a bit, otherwise umount fails often with EBUSY (TODO: investigate why).
+sleep 3
+
+# The fstests framework will now check for fs consistency with fsck.
+# The trimming was racy and caused some btree nodes to get full of zeroes on
+# disk, which obviously caused fs metadata corruption. The race often lead
+# to missing free space entries in a block group's free space cache too.
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/generic/038.out b/tests/generic/038.out
new file mode 100644
index 0000000..5e0f13e
--- /dev/null
+++ b/tests/generic/038.out
@@ -0,0 +1,2 @@
+QA output created by 038
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 9f355fc..1e89848 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -40,6 +40,7 @@
 035 auto quick
 036 auto aio rw stress
 037 metadata auto quick
+038 auto stress
 053 acl repair auto quick
 062 attr udf auto quick
 068 other auto freeze dangerous stress
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-12-02 18:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-26 15:30 [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim Filipe Manana
2014-12-01  5:25 ` Dave Chinner
2014-12-01 17:11 ` Filipe Manana
2014-12-02  3:06   ` Eryu Guan
2014-12-02 18:10 ` [PATCH v3] " Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).