From: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number
Date: Wed, 30 Sep 2015 10:45:01 +0900 [thread overview]
Message-ID: <560B3E9D.8050701@jp.fujitsu.com> (raw)
In-Reply-To: <560B354B.9050400@cn.fujitsu.com>
On 2015/09/30 10:05, Qu Wenruo wrote:
> Dave Chinner wrote on 2015/09/30 07:51 +1000:
>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>> Normally, a bull fallocate call on a fully written and synced file
>>> should not add an extent.
>>
>> Why not? Filesystems can do whatever they want with extents during
>> a fallocate call. e.g. if the blocks are shared, then fallocate
>> might break the block sharing so future overwrites don't get
>> ENOSPC. This is a requirement set down by posix_fallocate(3)
>>
>> "After a successful call to posix_fallocate(), subsequent writes to
>> bytes in the specified range are guaranteed not to fail because of
>> lack of disk space."
>>
>> Hence if you've got a file with shared blocks, a "full fallocate"
>> must change the extent layout to break the sharing. As such, the
>> premise of this test is wrong.
>
> First, btrfs never meets the posix_fallocate requirement by its COW nature.
>
> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
> Only when btrfs fallocate a new prealloc extent, next write into it may use the space if they are not shared between different snapshots.
>
> Under most case, btrfs fallocate follows the behavior of other non-COW filesystems.
> Which means, btrfs won't alloc new extent if there is existing extent, not matter if it's shared or not.
>
> As a result, fallocate in btrfs only works in a limited use-case, and can easily break posix requirement.
> Like the following case without snapshots:
> 1)Fallocate 0~50M
> 2)Write 0~50M <- Will not return ENOSPC
> 3)Write 0~25M <- COW happens, allocate another 25M,
> may cause ENOSPC.
> Or even easier with snapshot:
> 1)Fallocate 0~50M in subvol A
> 2)Snapshot subvol A into snap B
> 3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC
>
> As in step 3), fallocated 50M is shared, so write will be forced COW.
>
> So I'd prefer to make btrfs follows the behavior of other non-COW filesystems, as posix standard doesn't fit well here.
>>
>> That's not to say that btrfs has a bug:
>>
>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>> offset is smaller than inode size.
>>
>> But it' not clear that this behaviour is actually a bug if it's not
>> changing the file data.
> File data is not changed, as btrfs just COW the last tailing page, as reset the last already 0 part.
>
> Like the follow ascii arts:
>
> 0)Before
> 0 4K 8K 12K 16K
> |///////////////////////////|000|
> |<----------Extent A----------->|
> The file is 14K size, on disk(to be accurate, btrfs logical address space) it takes 16K, with last 2K padding 0.
>
> And all that 16K is in extent A.
>
> 1)Fallocate 0~14K
> In fact, all space in range 0~14K is allocated, so there is no need to reallocate any space.
>
> 2)But in btrfs
> Result will be:
> 0 4K 8K 12K 16K
> |///////////////////////////|000|
> |<------Extent A------->|<--B-->|
>
> Btrfs has a wrong judgment, which will always re-padding the last page.
> Causing a new extent, extent B to be created.
> Even the contents is the same with original last page.
>
> It's OK not to consider it as a bug, at least data is not corrupted.
> But IMO the btrfs behavior is not needed and need optimization.
> So kernel patch is submitted to btrfs ml:
> https://patchwork.kernel.org/patch/7284431/
Is this https://patchwork.kernel.org/patch/7283461/ ? Right?
Thanks,
Tsutomu
>
> And if fstests is not the proper place, any idea where such "test case" should belong?
>
> Thanks,
> Qu
>
>>
>> Cheers,
>>
>> Dave.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-09-30 1:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-29 9:34 [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number Qu Wenruo
2015-09-29 9:55 ` Eryu Guan
2015-09-29 10:16 ` Qu Wenruo
2015-09-29 10:33 ` Eryu Guan
2015-09-29 10:46 ` Qu Wenruo
2015-09-29 10:00 ` Hugo Mills
2015-09-29 10:13 ` Qu Wenruo
2015-09-29 10:24 ` Filipe Manana
2015-09-29 10:48 ` Qu Wenruo
2015-09-30 6:42 ` Duncan
2015-09-29 10:24 ` Hugo Mills
2015-09-29 21:51 ` Dave Chinner
2015-09-30 1:05 ` Qu Wenruo
2015-09-30 1:45 ` Tsutomu Itoh [this message]
2015-09-30 1:49 ` Qu Wenruo
2015-09-30 4:20 ` Dave Chinner
2015-09-30 5:48 ` Qu Wenruo
2015-10-02 8:35 ` Qu Wenruo
2015-10-06 1:31 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=560B3E9D.8050701@jp.fujitsu.com \
--to=t-itoh@jp.fujitsu.com \
--cc=fstests@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).