From: "Darrick J. Wong" <djwong@kernel.org>
To: Baokun Li <libaokun1@huawei.com>
Cc: fstests@vger.kernel.org, zlang@redhat.com, guaneryu@gmail.com,
amir73il@gmail.com, ritesh.list@gmail.com, yangerkun@huawei.com
Subject: Re: [PATCH v3] ext4: Regression test of ext4_lblk_t overflow
Date: Thu, 23 Nov 2023 09:03:29 -0800 [thread overview]
Message-ID: <20231123170329.GK36175@frogsfrogsfrogs> (raw)
In-Reply-To: <83e9bfca-28ef-d499-e2e2-488b2c269c12@huawei.com>
On Thu, Nov 23, 2023 at 09:46:37PM +0800, Baokun Li wrote:
> On 2023/11/23 0:32, Darrick J. Wong wrote:
> > On Wed, Nov 22, 2023 at 07:53:14PM +0800, Baokun Li wrote:
> > > Append writes to a file approaching 16T and observe if a kernel crash is
> > > caused by ext4_lblk_t overflow triggering BUG_ON at ext4_mb_new_inode_pa().
> > > This is a regression test for commit bc056e7163ac ("ext4: fix BUG in
> > > ext4_mb_new_inode_pa() due to overflow")
> > >
> > > Signed-off-by: Baokun Li <libaokun1@huawei.com>
> > > ---
> > > V1->V2:
> > > Changes to make the use case more generic, not just for testing
> > > ext4.(ext4 and xfs have been tested)
> > > V2->V3:
> > > Clean up the code and remove hardcoding.
> > >
> > > tests/generic/737 | 53 +++++++++++++++++++++++++++++++++++++++++++
> > > tests/generic/737.out | 2 ++
> > > 2 files changed, 55 insertions(+)
> > > create mode 100755 tests/generic/737
> > > create mode 100644 tests/generic/737.out
> > >
> > > diff --git a/tests/generic/737 b/tests/generic/737
> > > new file mode 100755
> > > index 00000000..29d428ad
> > > --- /dev/null
> > > +++ b/tests/generic/737
> > > @@ -0,0 +1,53 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2023 HUAWEI. All Rights Reserved.
> > > +#
> > > +# FS QA Test No. 737
> > > +#
> > > +# Append writes to a file approaching 16T and observe if a kernel crash is
> > > +# caused by ext4_lblk_t overflow triggering BUG_ON at ext4_mb_new_inode_pa().
> > > +# This is a regression test for commit
> > > +# bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
> > > +#
> > > +. ./common/preamble
> > > +. ./common/populate
> > > +_begin_fstest auto quick insert prealloc
> > > +
> > > +# real QA test starts here
> > > +[[ "$FSTYP" =~ ext* ]] && _fixed_by_kernel_commit bc056e7163ac \
> > > + "ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow"
> > > +
> > > +_require_odirect
> > > +_require_xfs_io_command "falloc"
> > > +_require_xfs_io_command "finsert"
> > > +
> > > +dev_size=$((100 * 1024 * 1024))
> > > +_scratch_mkfs_sized $dev_size >>$seqres.full 2>&1 || _fail "mkfs failed"
> > > +
> > > +_scratch_mount
> > > +blksz="$(_get_block_size ${SCRATCH_MNT})"
> > _get_file_block_size, not _get_block_size. The first one retrieves the
> > file allocation unit (e.g. ext4 bigalloc cluster size / xfs rt extent
> > size) whereas the second merely returns the base fs block size.
> >
> > That is an important distinction when you're messing with fallocate. :)
> _get_file_block_size is implemented as follows:
>
> _get_file_block_size()
> {
> if [ -z $1 ] || [ ! -d $1 ]; then
> echo "Missing mount point argument for _get_file_block_size"
> exit 1
> fi
>
> case "$FSTYP" in
> "ocfs2")
> stat -c '%o' $1
> ;;
> "xfs")
> _xfs_get_file_block_size $1
> ;;
> *)
> _get_block_size $1
> ;;
> esac
> }
>
> The return values of ocfs2 and xfs may be different, but they are the same
> for ext4. And the logical blocks recorded in ext4 are in blocks, not
> clusters.
> I'll replace _get_block_size with _get_file_block_size if
> _get_file_block_size
> should be used in xfs.
Oh silly me, I forgot that the logical block mappings in ext4 remain in
units of fs blocks, not bigalloc clusters. So this doesn't make much of
a difference.
> > > +# Reserve 1M space
> > > +$XFS_IO_PROG -f -c "falloc 0 1M" "${SCRATCH_MNT}/tmp" >> $seqres.full
> > > +
> > > +# Create a file (~16T) with logical block numbers close to overflow
> > > +$XFS_IO_PROG -f -c "falloc 0 10M" "${SCRATCH_MNT}/file" >> $seqres.full
> > > +insert_size=$((blksz * 4096 - 10 - 3))
> > What if blksz == 64k ? This won't compute a file position slightly
> > below 16T. I think the comment is wrong since you're trying to overflow
> > the u32 ext4_lblk_t, correct?
> Yes, the comment here is wrong. The actual intention here is to construct a
> file with logical blocks close to 0x100000000.
> >
> > I think what you really want is something more like...
> >
> > # Shift the last 9M of the file preallocations to a position just short
> > # of overflowing ext4_lblk_t.
> > max_pos=$(( 0xffffffff * file_blksz ))
> > finsert_len=$(( max_pos - ((10 + 3) << 20) ))
> > $XFS_IO_PROG -f -c "finsert 1M ${finsert_len}" "${SCRATCH_MNT}/file" >> $seqres.full
> Exactly!
> > Not sure why you shift 9M of data to 13M below what I think is the
> > upper range of ext4_lblk_t; I would have thought that would be
> > (max_pos - 9MB) but I'm assuming you know the reproduction circumstances
> > better than me...
> >
> > --D
> At 4k block size, when appending writes to a file close to 16T, the block
> allocation
> request will be enlarged to 8M, and the current file size + block allocation
> request
> size will not exceed 16T.
>
> Therefore, the above is just using finsert to construct a file with maximum
> logical
> block number close to 0x100000000, the corresponding size at 4k can be in
> the
> range of (16T-8M, 16T), the insertion location does not have any special
> meaning.
>
> 3M is not a special value, theoretically it can be in the range of (1M
> (reserved tmp), 8M].
> But ext4 reserves 2% of the blocks for metadata, which in this case is 2M,
> so the
> interval in which the problem can be triggered becomes (2M, 8M].
Does the test trigger the bug on other blocksizes like 1k or 64k?
Oh, there's a v4, will go look at that.
--D
> > > +$XFS_IO_PROG -f -c "finsert 1M ${insert_size}M" "${SCRATCH_MNT}/file" >> $seqres.full
> > > +
> > > +# Filling up the free space ensures that the pre-allocated space is the reserved space.
> > > +nr_free=$(stat -f -c '%f' ${SCRATCH_MNT})
> > > +_fill_fs $((nr_free * blksz)) ${SCRATCH_MNT}/fill $blksz 0 >> $seqres.full 2>&1
> > > +sync
> > > +
> > > +# Remove reserved space to gain free space for allocation
> > > +rm -f ${SCRATCH_MNT}/tmp
> > > +
> > > +# Trying to allocate two blocks triggers BUG_ON.
> > > +$XFS_IO_PROG -c "open -ad ${SCRATCH_MNT}/file" -c "pwrite -S 0xff 0 $((2 * blksz))" >> $seqres.full
> > > +
> > > +echo "Silence is golden"
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/737.out b/tests/generic/737.out
> > > new file mode 100644
> > > index 00000000..67b83d78
> > > --- /dev/null
> > > +++ b/tests/generic/737.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 737
> > > +Silence is golden
> > > --
> > > 2.31.1
> > >
> > >
> Thanks!
> --
> With Best Regards,
> Baokun Li
> .
>
next prev parent reply other threads:[~2023-11-23 17:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-22 11:53 [PATCH v3] ext4: Regression test of ext4_lblk_t overflow Baokun Li
2023-11-22 16:32 ` Darrick J. Wong
2023-11-23 13:46 ` Baokun Li
2023-11-23 17:03 ` Darrick J. Wong [this message]
2023-11-24 11:31 ` Baokun Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231123170329.GK36175@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=amir73il@gmail.com \
--cc=fstests@vger.kernel.org \
--cc=guaneryu@gmail.com \
--cc=libaokun1@huawei.com \
--cc=ritesh.list@gmail.com \
--cc=yangerkun@huawei.com \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox