linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Cc: <linux-btrfs@vger.kernel.org>, <fstests@vger.kernel.org>
Subject: Re: [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number
Date: Wed, 30 Sep 2015 09:49:50 +0800	[thread overview]
Message-ID: <560B3FBE.6080809@cn.fujitsu.com> (raw)
In-Reply-To: <560B3E9D.8050701@jp.fujitsu.com>



Tsutomu Itoh wrote on 2015/09/30 10:45 +0900:
> On 2015/09/30 10:05, Qu Wenruo wrote:
>> Dave Chinner wrote on 2015/09/30 07:51 +1000:
>>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>>> Normally, a bull fallocate call on a fully written and synced file
>>>> should not add an extent.
>>>
>>> Why not? Filesystems can do whatever they want with extents during
>>> a fallocate call. e.g. if the blocks are shared, then fallocate
>>> might break the block sharing so future overwrites don't get
>>> ENOSPC. This is a requirement set down by posix_fallocate(3)
>>>
>>> "After a successful call to posix_fallocate(), subsequent writes to
>>> bytes in the specified range are guaranteed not  to fail because of
>>> lack of disk space."
>>>
>>> Hence if you've got a file with shared blocks, a "full fallocate"
>>> must change the extent layout to break the sharing. As such, the
>>> premise of this test is wrong.
>>
>> First, btrfs never meets the posix_fallocate requirement by its COW
>> nature.
>>
>> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
>> Only when btrfs fallocate a new prealloc extent, next write into it
>> may use the space if they are not shared between different snapshots.
>>
>> Under most case, btrfs fallocate follows the behavior of other non-COW
>> filesystems.
>> Which means, btrfs won't alloc new extent if there is existing extent,
>> not matter if it's shared or not.
>>
>> As a result, fallocate in btrfs only works in a limited use-case, and
>> can easily break posix requirement.
>> Like the following case without snapshots:
>> 1)Fallocate 0~50M
>> 2)Write 0~50M        <- Will not return ENOSPC
>> 3)Write 0~25M        <- COW happens, allocate another 25M,
>>                 may cause ENOSPC.
>> Or even easier with snapshot:
>> 1)Fallocate 0~50M in subvol A
>> 2)Snapshot subvol A into snap B
>> 3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC
>>
>> As in step 3), fallocated 50M is shared, so write will be forced COW.
>>
>> So I'd prefer to make btrfs follows the behavior of other non-COW
>> filesystems, as posix standard doesn't fit well here.
>>>
>>> That's not to say that btrfs has a bug:
>>>
>>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>>> offset is smaller than inode size.
>>>
>>> But it' not clear that this behaviour is actually a bug if it's not
>>> changing the file data.
>> File data is not changed, as btrfs just COW the last tailing page, as
>> reset the last already 0 part.
>>
>> Like the follow ascii arts:
>>
>> 0)Before
>> 0    4K    8K    12K    16K
>> |///////////////////////////|000|
>> |<----------Extent A----------->|
>> The file is 14K size, on disk(to be accurate, btrfs logical address
>> space) it takes 16K, with last 2K padding 0.
>>
>> And all that 16K is in extent A.
>>
>> 1)Fallocate 0~14K
>> In fact, all space in range 0~14K is allocated, so there is no need to
>> reallocate any space.
>>
>> 2)But in btrfs
>> Result will be:
>> 0    4K    8K    12K    16K
>> |///////////////////////////|000|
>> |<------Extent A------->|<--B-->|
>>
>> Btrfs has a wrong judgment, which will always re-padding the last page.
>> Causing a new extent, extent B to be created.
>> Even the contents is the same with original last page.
>>
>> It's OK not to consider it as a bug, at least data is not corrupted.
>> But IMO the btrfs behavior is not needed and need optimization.
>
>> So kernel patch is submitted to btrfs ml:
>> https://patchwork.kernel.org/patch/7284431/
>
> Is this https://patchwork.kernel.org/patch/7283461/ ?  Right?
>
> Thanks,
> Tsutomu

Oh, my fault, 728461 is the right patch...

Thanks for pointing this out,
Qu
>
>>
>> And if fstests is not the proper place, any idea where such "test
>> case" should belong?
>>
>> Thanks,
>> Qu
>>
>>>
>>> Cheers,
>>>
>>> Dave.
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

  reply	other threads:[~2015-09-30  1:49 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-29  9:34 [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number Qu Wenruo
2015-09-29  9:55 ` Eryu Guan
2015-09-29 10:16   ` Qu Wenruo
2015-09-29 10:33     ` Eryu Guan
2015-09-29 10:46       ` Qu Wenruo
2015-09-29 10:00 ` Hugo Mills
2015-09-29 10:13   ` Qu Wenruo
2015-09-29 10:24     ` Filipe Manana
2015-09-29 10:48       ` Qu Wenruo
2015-09-30  6:42         ` Duncan
2015-09-29 10:24     ` Hugo Mills
2015-09-29 21:51 ` Dave Chinner
2015-09-30  1:05   ` Qu Wenruo
2015-09-30  1:45     ` Tsutomu Itoh
2015-09-30  1:49       ` Qu Wenruo [this message]
2015-09-30  4:20     ` Dave Chinner
2015-09-30  5:48       ` Qu Wenruo
2015-10-02  8:35   ` Qu Wenruo
2015-10-06  1:31     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=560B3FBE.6080809@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=t-itoh@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).