From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: Qu Wenruo <wqu@suse.com>,
linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH] btrfs/276: allow a slight increase in the number of extents
Date: Thu, 3 Aug 2023 09:30:46 +0800 [thread overview]
Message-ID: <b8c5c67b-57f9-eacb-bf54-ee49a89f6e98@gmx.com> (raw)
In-Reply-To: <CAL3q7H7kNyMBRPOH2DjohGKog69H4sdd306cZpe17fLLsHPsSA@mail.gmail.com>
On 2023/8/3 01:28, Filipe Manana wrote:
> On Wed, Aug 2, 2023 at 12:18 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/8/2 18:36, Qu Wenruo wrote:
>>>
>>>
>>> On 2023/8/2 18:23, Filipe Manana wrote:
>>>> On Tue, Aug 1, 2023 at 8:16 AM Qu Wenruo <wqu@suse.com> wrote:
>>>>>
>>>>> [BUG]
>>>>> Sometimes test case btrfs/276 would fail with extra number of extents:
>>>>>
>>>>> - output mismatch (see /opt/xfstests/results//btrfs/276.out.bad)
>>>>> --- tests/btrfs/276.out 2023-07-19 07:24:07.000000000 +0000
>>>>> +++ /opt/xfstests/results//btrfs/276.out.bad 2023-07-28
>>>>> 04:15:06.223985372 +0000
>>>>> @@ -1,16 +1,16 @@
>>>>> QA output created by 276
>>>>> wrote 17179869184/17179869184 bytes at offset 0
>>>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>>> -Number of non-shared extents in the whole file: 131072
>>>>> +Number of non-shared extents in the whole file: 131082
>>>>> Create a snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/snap'
>>>>> -Number of shared extents in the whole file: 131072
>>>>> ...
>>>>> (Run 'diff -u /opt/xfstests/tests/btrfs/276.out
>>>>> /opt/xfstests/results//btrfs/276.out.bad' to see the entire diff)
>>>>>
>>>>> [CAUSE]
>>>>> The test case uses golden output to record the number of total extents
>>>>> of a 16G file.
>>>>>
>>>>> This is not reliable as we can have writeback happen halfway, resulting
>>>>> smaller extents thus slightly more extents.
>>>>>
>>>>> With a VM with 4G memory, I have a chance around 1/10 hitting this
>>>>> false alert.
>>>>>
>>>>> [FIX]
>>>>> Instead of using golden output, we allow a slight (5%) float in the
>>>>> number of extents, and move the 131072 (and 131072 - 16) from golden
>>>>> output, so even if we have a slightly more extents, we can still pass
>>>>> the test.
>>>>>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> ---
>>>>> tests/btrfs/276 | 41 ++++++++++++++++++++++++++++++++++++-----
>>>>> tests/btrfs/276.out | 4 ----
>>>>> 2 files changed, 36 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/tests/btrfs/276 b/tests/btrfs/276
>>>>> index 944b0c8f..a63b28bb 100755
>>>>> --- a/tests/btrfs/276
>>>>> +++ b/tests/btrfs/276
>>>>> @@ -65,10 +65,17 @@ count_not_shared_extents()
>>>>>
>>>>> # Create a 16G file as that results in 131072 extents, all with a
>>>>> size of 128K
>>>>> # (due to compression), and a fs tree with a height of 3 (root node
>>>>> at level 2).
>>>>> +#
>>>>> +# But due to writeback can happen halfway, we may have slightly more
>>>>> extents
>>>>> +# than 128K, so we allow 5% increase in the number of extents.
>>>>> +#
>>>>> # We want to verify later that fiemap correctly reports the
>>>>> sharedness of each
>>>>> # extent, even when it needs to switch from one leaf to the next
>>>>> one and from a
>>>>> # node at level 1 to the next node at level 1.
>>>>> #
>>>>> +nr_extents_lower=$((128 * 1024))
>>>>> +nr_extents_upper=$((128 * 1024 + 128 * 1024 / 20))
>>>>> +
>>>>> $XFS_IO_PROG -f -c "pwrite -b 8M 0 16G" $SCRATCH_MNT/foo |
>>>>> _filter_xfs_io
>>>>
>>>> Does adding '-s' (fsync after every write) to the $XFS_IO_PROG fixes
>>>> the issue?
>>>> On my test vm, it doesn't increase runtime by that much (16 to 23
>>>> seconds).
>>>
>>> This indeed is much better, without dirty pages pressure we can go the
>>> old golden output.
>>
>> Unfortunately I still have a very low chance (~1/20) to hit 1~3 more
>> extent than the golden output.
>>
>> There are still extra things like the commit intervals to let us to
>> writeback halfway.
>>
>> The best situation would be direct IO, but unfortunately direct IO
>> doesn't support compression, thus the resulted file would lead to merged
>> fiemap results.
>
> The compression is not needed.
> I wrote the test to use compression because it makes creating a file with
> thousands of extents very fast and using very little space.
>
> The goal is really to have many file extent items spanning multiple leaves and
> to have a root at level 2 (or higher), to verify the sharedness
> detection is correct
> for subtrees.
>
>>
>> The other solution is to write between two files using direct IO, to
>> make each extent inside the same file not continuous with each other.
>>
>> But that would lead to at least 512M * 2, and we also need to do the
>> same interleaved writes again for the 1M writes.
>>
>> Any ideas would be appreciated.
>
> See if this works:
>
> https://pastebin.com/raw/QnNtSrDP
This works fine as after 20 runs I still didn't hit any output mismatch.
Although you can further improve it by using direct IO and much smaller
blocksize (4K), which can further reduce the size and completely
eliminate the possibility of writeback halfway.
(This would require 4K sector size though)
In that case, you can even reuse the old 131072 number, as we only need
around 512M (file size would be 1G) for data.
Another thing is, if tree level 3 (root node level 2) is needed, we can
even further reduce the file size by requiring 4K nodesize, which can
further speed up the test, without the need for xattr.
>
> It accomplishes the same goals and it's still fast (about 23 seconds
> on my non-debug test vm, before it was about 16 seconds).
The modified version is already fast enough even with a KASAN enabled
kernel, only 60s compared to the old 120+s.
Mind to send the modification as a proper fix? That's a much improved
version compared to mine.
Thanks,
Qu
>
> Thanks.
>
>>
>> Thanks,
>> Qu
>>>
>>> Thanks,
>>> Qu
>>>>
>>>> I'd rather do that so that we can be sure fiemap is working correctly
>>>> and not returning more extents than there really are - this approach
>>>> of allowing a bit more allows for that type of bug to be unnoticed,
>>>> plus that little bit more might not be always enough (depending on
>>>> available rm, writeback settings, etc).
>>>>
>>>> Thanks.
>>>>
>>>>>
>>>>> # Sync to flush delalloc and commit the current transaction, so
>>>>> fiemap will see
>>>>> @@ -76,13 +83,22 @@ $XFS_IO_PROG -f -c "pwrite -b 8M 0 16G"
>>>>> $SCRATCH_MNT/foo | _filter_xfs_io
>>>>> sync
>>>>>
>>>>> # All extents should be reported as non shared (131072 extents).
>>>>> -echo "Number of non-shared extents in the whole file:
>>>>> $(count_not_shared_extents)"
>>>>> +found1=$(count_not_shared_extents)
>>>>> +echo "Number of non-shared extents in the whole file: ${found1}" >>
>>>>> $seqres.full
>>>>> +
>>>>> +if [ $found1 -lt $nr_extents_lower -o $found1 -gt $nr_extents_upper
>>>>> ]; then
>>>>> + echo "unexpected initial number of extents, has $found1
>>>>> expect [$nr_extents_lower, $nr_extents_upper]"
>>>>> +fi
>>>>>
>>>>> # Creating a snapshot.
>>>>> $BTRFS_UTIL_PROG subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
>>>>> | _filter_scratch
>>>>>
>>>>> # We have a snapshot, so now all extents should be reported as shared.
>>>>> -echo "Number of shared extents in the whole file:
>>>>> $(count_shared_extents)"
>>>>> +found2=$(count_shared_extents)
>>>>> +echo "Number of shared extents in the whole file: ${found2}" >>
>>>>> $seqres.full
>>>>> +if [ $found2 -ne $found1 ]; then
>>>>> + echo "unexpected shared extents, has $found2 expect $found1"
>>>>> +fi
>>>>>
>>>>> # Now COW two file ranges, of 1M each, in the snapshot's file.
>>>>> # So 16 extents should become non-shared after this.
>>>>> @@ -97,8 +113,18 @@ sync
>>>>>
>>>>> # Now we should have 16 non-shared extents and 131056 (131072 - 16)
>>>>> shared
>>>>> # extents.
>>>>> -echo "Number of non-shared extents in the whole file:
>>>>> $(count_not_shared_extents)"
>>>>> -echo "Number of shared extents in the whole file:
>>>>> $(count_shared_extents)"
>>>>> +found3=$(count_not_shared_extents)
>>>>> +found4=$(count_shared_extents)
>>>>> +echo "Number of non-shared extents in the whole file: ${found3}"
>>>>> +echo "Number of shared extents in the whole file: ${found4}" >>
>>>>> $seqres.full
>>>>> +
>>>>> +if [ $found3 != 16 ]; then
>>>>> + echo "Unexpected number of non-shared extents, has $found3
>>>>> expect 16"
>>>>> +fi
>>>>> +
>>>>> +if [ $found4 != $(( $found1 - $found3 )) ]; then
>>>>> + echo "Unexpected number of shared extents, has $found4 expect
>>>>> $(( $found1 - $found3 ))"
>>>>> +fi
>>>>>
>>>>> # Check that the non-shared extents are indeed in the expected file
>>>>> ranges (each
>>>>> # with 8 extents).
>>>>> @@ -117,7 +143,12 @@ _scratch_remount commit=1
>>>>> sleep 1.1
>>>>>
>>>>> # Now all extents should be reported as not shared (131072 extents).
>>>>> -echo "Number of non-shared extents in the whole file:
>>>>> $(count_not_shared_extents)"
>>>>> +found5=$(count_not_shared_extents)
>>>>> +echo "Number of non-shared extents in the whole file: ${found5}" >>
>>>>> $seqres.full
>>>>> +
>>>>> +if [ $found5 != $found1 ]; then
>>>>> + echo "Unexpected final number of non-shared extents, has
>>>>> $found5 expect $found1"
>>>>> +fi
>>>>>
>>>>> # success, all done
>>>>> status=0
>>>>> diff --git a/tests/btrfs/276.out b/tests/btrfs/276.out
>>>>> index 3bf5a5e6..e318c2e9 100644
>>>>> --- a/tests/btrfs/276.out
>>>>> +++ b/tests/btrfs/276.out
>>>>> @@ -1,16 +1,12 @@
>>>>> QA output created by 276
>>>>> wrote 17179869184/17179869184 bytes at offset 0
>>>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>>> -Number of non-shared extents in the whole file: 131072
>>>>> Create a snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/snap'
>>>>> -Number of shared extents in the whole file: 131072
>>>>> wrote 1048576/1048576 bytes at offset 8388608
>>>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>>> wrote 1048576/1048576 bytes at offset 12884901888
>>>>> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>>> Number of non-shared extents in the whole file: 16
>>>>> -Number of shared extents in the whole file: 131056
>>>>> Number of non-shared extents in range [8M, 9M): 8
>>>>> Number of non-shared extents in range [12G, 12G + 1M): 8
>>>>> Delete subvolume (commit): 'SCRATCH_MNT/snap'
>>>>> -Number of non-shared extents in the whole file: 131072
>>>>> --
>>>>> 2.41.0
>>>>>
next prev parent reply other threads:[~2023-08-03 1:31 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-01 6:55 [PATCH] btrfs/276: allow a slight increase in the number of extents Qu Wenruo
2023-08-01 13:57 ` Josef Bacik
2023-08-01 15:23 ` Darrick J. Wong
2023-08-02 10:23 ` Filipe Manana
2023-08-02 10:36 ` Qu Wenruo
2023-08-02 11:18 ` Qu Wenruo
2023-08-02 17:28 ` Filipe Manana
2023-08-03 1:30 ` Qu Wenruo [this message]
2023-08-03 9:02 ` Filipe Manana
2023-08-03 10:32 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b8c5c67b-57f9-eacb-bf54-ee49a89f6e98@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=fdmanana@kernel.org \
--cc=fstests@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox