From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number
Date: Wed, 30 Sep 2015 09:05:15 +0800 [thread overview]
Message-ID: <560B354B.9050400@cn.fujitsu.com> (raw)
In-Reply-To: <20150929215135.GZ3902@dastard>
Dave Chinner wrote on 2015/09/30 07:51 +1000:
> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>> Normally, a bull fallocate call on a fully written and synced file
>> should not add an extent.
>
> Why not? Filesystems can do whatever they want with extents during
> a fallocate call. e.g. if the blocks are shared, then fallocate
> might break the block sharing so future overwrites don't get
> ENOSPC. This is a requirement set down by posix_fallocate(3)
>
> "After a successful call to posix_fallocate(), subsequent writes to
> bytes in the specified range are guaranteed not to fail because of
> lack of disk space."
>
> Hence if you've got a file with shared blocks, a "full fallocate"
> must change the extent layout to break the sharing. As such, the
> premise of this test is wrong.
First, btrfs never meets the posix_fallocate requirement by its COW nature.
Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
Only when btrfs fallocate a new prealloc extent, next write into it may
use the space if they are not shared between different snapshots.
Under most case, btrfs fallocate follows the behavior of other non-COW
filesystems.
Which means, btrfs won't alloc new extent if there is existing extent,
not matter if it's shared or not.
As a result, fallocate in btrfs only works in a limited use-case, and
can easily break posix requirement.
Like the following case without snapshots:
1)Fallocate 0~50M
2)Write 0~50M <- Will not return ENOSPC
3)Write 0~25M <- COW happens, allocate another 25M,
may cause ENOSPC.
Or even easier with snapshot:
1)Fallocate 0~50M in subvol A
2)Snapshot subvol A into snap B
3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC
As in step 3), fallocated 50M is shared, so write will be forced COW.
So I'd prefer to make btrfs follows the behavior of other non-COW
filesystems, as posix standard doesn't fit well here.
>
> That's not to say that btrfs has a bug:
>
>> Btrfs has a bug to always truncate the last page if the fallocate start
>> offset is smaller than inode size.
>
> But it' not clear that this behaviour is actually a bug if it's not
> changing the file data.
File data is not changed, as btrfs just COW the last tailing page, as
reset the last already 0 part.
Like the follow ascii arts:
0)Before
0 4K 8K 12K 16K
|///////////////////////////|000|
|<----------Extent A----------->|
The file is 14K size, on disk(to be accurate, btrfs logical address
space) it takes 16K, with last 2K padding 0.
And all that 16K is in extent A.
1)Fallocate 0~14K
In fact, all space in range 0~14K is allocated, so there is no need to
reallocate any space.
2)But in btrfs
Result will be:
0 4K 8K 12K 16K
|///////////////////////////|000|
|<------Extent A------->|<--B-->|
Btrfs has a wrong judgment, which will always re-padding the last page.
Causing a new extent, extent B to be created.
Even the contents is the same with original last page.
It's OK not to consider it as a bug, at least data is not corrupted.
But IMO the btrfs behavior is not needed and need optimization.
So kernel patch is submitted to btrfs ml:
https://patchwork.kernel.org/patch/7284431/
And if fstests is not the proper place, any idea where such "test case"
should belong?
Thanks,
Qu
>
> Cheers,
>
> Dave.
>
next prev parent reply other threads:[~2015-09-30 1:05 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-29 9:34 [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number Qu Wenruo
2015-09-29 9:55 ` Eryu Guan
2015-09-29 10:16 ` Qu Wenruo
2015-09-29 10:33 ` Eryu Guan
2015-09-29 10:46 ` Qu Wenruo
2015-09-29 10:00 ` Hugo Mills
2015-09-29 10:13 ` Qu Wenruo
2015-09-29 10:24 ` Filipe Manana
2015-09-29 10:48 ` Qu Wenruo
2015-09-30 6:42 ` Duncan
2015-09-29 10:24 ` Hugo Mills
2015-09-29 21:51 ` Dave Chinner
2015-09-30 1:05 ` Qu Wenruo [this message]
2015-09-30 1:45 ` Tsutomu Itoh
2015-09-30 1:49 ` Qu Wenruo
2015-09-30 4:20 ` Dave Chinner
2015-09-30 5:48 ` Qu Wenruo
2015-10-02 8:35 ` Qu Wenruo
2015-10-06 1:31 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=560B354B.9050400@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox