From: Anand Jain <anand.jain@oracle.com>
To: Qu Wenruo <wqu@suse.com>,
linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH v2] fstests: btrfs: a new test case to verify a use-after-free bug
Date: Mon, 26 Aug 2024 21:55:29 +0800 [thread overview]
Message-ID: <9feb3f6e-b682-4978-9d35-3f5176d96e38@oracle.com> (raw)
In-Reply-To: <20240824103021.264856-1-wqu@suse.com>
On 24/8/24 6:30 pm, Qu Wenruo wrote:
> [BUG]
> There is a use-after-free bug triggered very randomly by btrfs/125.
>
> If KASAN is enabled it can be triggered on certain setup.
> Or it can lead to crash.
>
> [CAUSE]
> The test case btrfs/125 is using RAID5 for metadata, which has a known
> RMW problem if the there is some corruption on-disk.
>
> RMW will use the corrupted contents to generate a new parity, losing the
> final chance to rebuild the contents.
>
> This is specific to metadata, as for data we have extra data checksum,
> but the metadata has extra problems like possible deadlock due to the
> extra metadata read/recovery needed to search the extent tree.
>
> And we know this problem for a while but without a better solution other
> than avoid using RAID56 for metadata:
>
>> Metadata
>> Do not use raid5 nor raid6 for metadata. Use raid1 or raid1c3
>> respectively.
>
> Combined with the above csum tree corruption, since RAID5 is stripe
> based, btrfs needs to split its read bios according to stripe boundary.
> And after a split, do a csum tree lookup for the expected csum.
>
> But if that csum lookup failed, in the error path btrfs doesn't handle
> the split bios properly and lead to double freeing of the original bio
> (the one containing the bio vectors).
>
> [NEW TEST CASE]
> Unlike the original btrfs/125, which is very random and picky to
> reproduce, introduce a new test case to verify the specific behavior by:
>
> - Create a btrfs with enough csum leaves
> To bump the csum tree level, use the minimal nodesize possible (4K).
> Writing 32M data which needs at least 8 leaves for data checksum
>
> - Find the last csum tree leave and corrupt it
>
> - Read the data many times until we trigger the bug or exit gracefully
> With an x86_64 VM (which is never able to trigger btrfs/125 failure)
> with KASAN enabled, it can trigger the KASAN report in just 4
> iterations (the default iteration number is 32).
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Fix the wrong commit hash
> The proper fix is not yet merged, the old hash is a place holder
> copied from another test case but forgot to remove.
>
> - Minor wording update
>
> - Add to "dangerous" group
> ---
> tests/btrfs/319 | 84 +++++++++++++++++++++++++++++++++++++++++++++
> tests/btrfs/319.out | 2 ++
> 2 files changed, 86 insertions(+)
> create mode 100755 tests/btrfs/319
> create mode 100644 tests/btrfs/319.out
>
> diff --git a/tests/btrfs/319 b/tests/btrfs/319
> new file mode 100755
> index 00000000..4be2b50b
> --- /dev/null
> +++ b/tests/btrfs/319
> @@ -0,0 +1,84 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 319
> +#
> +# Make sure data csum lookup failure will not lead to double bio freeing
> +#
> +. ./common/preamble
> +_begin_fstest auto quick dangerous
> +
> +_require_scratch
> +_fixed_by_kernel_commit xxxxxxxxxxxx \
> + "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()"
> +
> +# The final fs will have a corrupted csum tree, which will never pass fsck
> +_require_scratch_nocheck
> +_require_scratch_dev_pool 2
> +
> +# Use RAID0 for data to get bio splitted according to stripe boundary.
> +# This is required to trigger the bug.
> +_check_btrfs_raid_type raid0
Did you mean to use _require_btrfs_raid_type(raid0)? Otherwise,
the line has no effect since you're not checking
_check_btrfs_raid_type's return value.
The rest looks good.
Thx.
Anand
> +
> +# This test goes 4K sectorsize and 4K nodesize, so that we easily create
> +# higher level of csum tree.
> +_require_btrfs_support_sectorsize 4096
> +
> +# The bug itself has a race window, run this many times to ensure triggering.
> +# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run.
> +runtime=32
> +
> +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1
> +# This test requires data checksum to trigger the bug.
> +_scratch_mount -o datasum,datacow
> +
> +# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data
> +# will need 32K data checksum at minimal, which is at least 8 leaves.
> +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null
> +sync
> +_scratch_unmount
> +
> +# Search for the last leaf of the csum tree, that will be the target to destroy.
> +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full
> +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ )
> +
> +if [ -z "$target_bytenr" ]; then
> + _fail "unable to locate the last csum tree leave"
> +fi
> +
> +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full
> +
> +# Corrupt that csum tree block.
> +physical=$(_btrfs_get_physical "$target_bytenr" 1)
> +dev=$(_btrfs_get_device_path "$target_bytenr" 1)
> +
> +echo "physical bytenr: $physical" >> $seqres.full
> +echo "physical device: $dev" >> $seqres.full
> +
> +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null
> +
> +for (( i = 0; i < $runtime; i++ )); do
> + echo "=== run $i/$runtime ===" >> $seqres.full
> + _scratch_mount -o ro
> + # Since the data is on RAID0, read bios will be split at the stripe
> + # (64K sized) boundary. If csum lookup failed due to corrupted csum
> + # tree, there is a race window that can lead to double bio freeing
> + # (triggering KASAN at least).
> + cat "$SCRATCH_MNT/foobar" &> /dev/null
> + _scratch_unmount
> +
> + # Manually check the dmesg for "BUG", and do not call _check_dmesg()
> + # as it will clear 'check_dmesg' file and skips the final check after
> + # the test.
> + # For now just focus on the "BUG:" line from KASAN.
> + if _check_dmesg_for "BUG" ; then
> + _fail "Critical error(s) found in dmesg"
> + fi
> +done
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out
> new file mode 100644
> index 00000000..d40c929a
> --- /dev/null
> +++ b/tests/btrfs/319.out
> @@ -0,0 +1,2 @@
> +QA output created by 319
> +Silence is golden
prev parent reply other threads:[~2024-08-26 13:55 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-24 10:30 [PATCH v2] fstests: btrfs: a new test case to verify a use-after-free bug Qu Wenruo
2024-08-26 12:14 ` Filipe Manana
2024-08-26 12:45 ` Filipe Manana
2024-08-26 22:15 ` Qu Wenruo
2024-08-26 13:55 ` Anand Jain [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9feb3f6e-b682-4978-9d35-3f5176d96e38@oracle.com \
--to=anand.jain@oracle.com \
--cc=fstests@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox