Re: [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number
Date: Wed, 30 Sep 2015 10:45:01 +0900	[thread overview]
Message-ID: <560B3E9D.8050701@jp.fujitsu.com> (raw)
In-Reply-To: <560B354B.9050400@cn.fujitsu.com>

On 2015/09/30 10:05, Qu Wenruo wrote:
> Dave Chinner wrote on 2015/09/30 07:51 +1000:
>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>> Normally, a bull fallocate call on a fully written and synced file
>>> should not add an extent.
>>
>> Why not? Filesystems can do whatever they want with extents during
>> a fallocate call. e.g. if the blocks are shared, then fallocate
>> might break the block sharing so future overwrites don't get
>> ENOSPC. This is a requirement set down by posix_fallocate(3)
>>
>> "After a successful call to posix_fallocate(), subsequent writes to
>> bytes in the specified range are guaranteed not  to fail because of
>> lack of disk space."
>>
>> Hence if you've got a file with shared blocks, a "full fallocate"
>> must change the extent layout to break the sharing. As such, the
>> premise of this test is wrong.
>
> First, btrfs never meets the posix_fallocate requirement by its COW nature.
>
> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
> Only when btrfs fallocate a new prealloc extent, next write into it may use the space if they are not shared between different snapshots.
>
> Under most case, btrfs fallocate follows the behavior of other non-COW filesystems.
> Which means, btrfs won't alloc new extent if there is existing extent, not matter if it's shared or not.
>
> As a result, fallocate in btrfs only works in a limited use-case, and can easily break posix requirement.
> Like the following case without snapshots:
> 1)Fallocate 0~50M
> 2)Write 0~50M        <- Will not return ENOSPC
> 3)Write 0~25M        <- COW happens, allocate another 25M,
>                 may cause ENOSPC.
> Or even easier with snapshot:
> 1)Fallocate 0~50M in subvol A
> 2)Snapshot subvol A into snap B
> 3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC
>
> As in step 3), fallocated 50M is shared, so write will be forced COW.
>
> So I'd prefer to make btrfs follows the behavior of other non-COW filesystems, as posix standard doesn't fit well here.
>>
>> That's not to say that btrfs has a bug:
>>
>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>> offset is smaller than inode size.
>>
>> But it' not clear that this behaviour is actually a bug if it's not
>> changing the file data.
> File data is not changed, as btrfs just COW the last tailing page, as reset the last already 0 part.
>
> Like the follow ascii arts:
>
> 0)Before
> 0    4K    8K    12K    16K
> |///////////////////////////|000|
> |<----------Extent A----------->|
> The file is 14K size, on disk(to be accurate, btrfs logical address space) it takes 16K, with last 2K padding 0.
>
> And all that 16K is in extent A.
>
> 1)Fallocate 0~14K
> In fact, all space in range 0~14K is allocated, so there is no need to reallocate any space.
>
> 2)But in btrfs
> Result will be:
> 0    4K    8K    12K    16K
> |///////////////////////////|000|
> |<------Extent A------->|<--B-->|
>
> Btrfs has a wrong judgment, which will always re-padding the last page.
> Causing a new extent, extent B to be created.
> Even the contents is the same with original last page.
>
> It's OK not to consider it as a bug, at least data is not corrupted.
> But IMO the btrfs behavior is not needed and need optimization.

> So kernel patch is submitted to btrfs ml:
> https://patchwork.kernel.org/patch/7284431/

Is this https://patchwork.kernel.org/patch/7283461/ ?  Right?

Thanks,
Tsutomu

>
> And if fstests is not the proper place, any idea where such "test case" should belong?
>
> Thanks,
> Qu
>
>>
>> Cheers,
>>
>> Dave.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2015-09-30  1:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-29  9:34 [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number Qu Wenruo
2015-09-29  9:55 ` Eryu Guan
2015-09-29 10:16   ` Qu Wenruo
2015-09-29 10:33     ` Eryu Guan
2015-09-29 10:46       ` Qu Wenruo
2015-09-29 10:00 ` Hugo Mills
2015-09-29 10:13   ` Qu Wenruo
2015-09-29 10:24     ` Filipe Manana
2015-09-29 10:48       ` Qu Wenruo
2015-09-30  6:42         ` Duncan
2015-09-29 10:24     ` Hugo Mills
2015-09-29 21:51 ` Dave Chinner
2015-09-30  1:05   ` Qu Wenruo
2015-09-30  1:45     ` Tsutomu Itoh [this message]
2015-09-30  1:49       ` Qu Wenruo
2015-09-30  4:20     ` Dave Chinner
2015-09-30  5:48       ` Qu Wenruo
2015-10-02  8:35   ` Qu Wenruo
2015-10-06  1:31     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=560B3E9D.8050701@jp.fujitsu.com \
    --to=t-itoh@jp.fujitsu.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).