All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number
Date: Wed, 30 Sep 2015 09:49:50 +0800	[thread overview]
Message-ID: <560B3FBE.6080809@cn.fujitsu.com> (raw)
In-Reply-To: <560B3E9D.8050701@jp.fujitsu.com>



Tsutomu Itoh wrote on 2015/09/30 10:45 +0900:
> On 2015/09/30 10:05, Qu Wenruo wrote:
>> Dave Chinner wrote on 2015/09/30 07:51 +1000:
>>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>>> Normally, a bull fallocate call on a fully written and synced file
>>>> should not add an extent.
>>>
>>> Why not? Filesystems can do whatever they want with extents during
>>> a fallocate call. e.g. if the blocks are shared, then fallocate
>>> might break the block sharing so future overwrites don't get
>>> ENOSPC. This is a requirement set down by posix_fallocate(3)
>>>
>>> "After a successful call to posix_fallocate(), subsequent writes to
>>> bytes in the specified range are guaranteed not  to fail because of
>>> lack of disk space."
>>>
>>> Hence if you've got a file with shared blocks, a "full fallocate"
>>> must change the extent layout to break the sharing. As such, the
>>> premise of this test is wrong.
>>
>> First, btrfs never meets the posix_fallocate requirement by its COW
>> nature.
>>
>> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
>> Only when btrfs fallocate a new prealloc extent, next write into it
>> may use the space if they are not shared between different snapshots.
>>
>> Under most case, btrfs fallocate follows the behavior of other non-COW
>> filesystems.
>> Which means, btrfs won't alloc new extent if there is existing extent,
>> not matter if it's shared or not.
>>
>> As a result, fallocate in btrfs only works in a limited use-case, and
>> can easily break posix requirement.
>> Like the following case without snapshots:
>> 1)Fallocate 0~50M
>> 2)Write 0~50M        <- Will not return ENOSPC
>> 3)Write 0~25M        <- COW happens, allocate another 25M,
>>                 may cause ENOSPC.
>> Or even easier with snapshot:
>> 1)Fallocate 0~50M in subvol A
>> 2)Snapshot subvol A into snap B
>> 3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC
>>
>> As in step 3), fallocated 50M is shared, so write will be forced COW.
>>
>> So I'd prefer to make btrfs follows the behavior of other non-COW
>> filesystems, as posix standard doesn't fit well here.
>>>
>>> That's not to say that btrfs has a bug:
>>>
>>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>>> offset is smaller than inode size.
>>>
>>> But it' not clear that this behaviour is actually a bug if it's not
>>> changing the file data.
>> File data is not changed, as btrfs just COW the last tailing page, as
>> reset the last already 0 part.
>>
>> Like the follow ascii arts:
>>
>> 0)Before
>> 0    4K    8K    12K    16K
>> |///////////////////////////|000|
>> |<----------Extent A----------->|
>> The file is 14K size, on disk(to be accurate, btrfs logical address
>> space) it takes 16K, with last 2K padding 0.
>>
>> And all that 16K is in extent A.
>>
>> 1)Fallocate 0~14K
>> In fact, all space in range 0~14K is allocated, so there is no need to
>> reallocate any space.
>>
>> 2)But in btrfs
>> Result will be:
>> 0    4K    8K    12K    16K
>> |///////////////////////////|000|
>> |<------Extent A------->|<--B-->|
>>
>> Btrfs has a wrong judgment, which will always re-padding the last page.
>> Causing a new extent, extent B to be created.
>> Even the contents is the same with original last page.
>>
>> It's OK not to consider it as a bug, at least data is not corrupted.
>> But IMO the btrfs behavior is not needed and need optimization.
>
>> So kernel patch is submitted to btrfs ml:
>> https://patchwork.kernel.org/patch/7284431/
>
> Is this https://patchwork.kernel.org/patch/7283461/ ?  Right?
>
> Thanks,
> Tsutomu

Oh, my fault, 728461 is the right patch...

Thanks for pointing this out,
Qu
>
>>
>> And if fstests is not the proper place, any idea where such "test
>> case" should belong?
>>
>> Thanks,
>> Qu
>>
>>>
>>> Cheers,
>>>
>>> Dave.
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

WARNING: multiple messages have this Message-ID (diff)
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Cc: <linux-btrfs@vger.kernel.org>, <fstests@vger.kernel.org>
Subject: Re: [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number
Date: Wed, 30 Sep 2015 09:49:50 +0800	[thread overview]
Message-ID: <560B3FBE.6080809@cn.fujitsu.com> (raw)
In-Reply-To: <560B3E9D.8050701@jp.fujitsu.com>



Tsutomu Itoh wrote on 2015/09/30 10:45 +0900:
> On 2015/09/30 10:05, Qu Wenruo wrote:
>> Dave Chinner wrote on 2015/09/30 07:51 +1000:
>>> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>>>> Normally, a bull fallocate call on a fully written and synced file
>>>> should not add an extent.
>>>
>>> Why not? Filesystems can do whatever they want with extents during
>>> a fallocate call. e.g. if the blocks are shared, then fallocate
>>> might break the block sharing so future overwrites don't get
>>> ENOSPC. This is a requirement set down by posix_fallocate(3)
>>>
>>> "After a successful call to posix_fallocate(), subsequent writes to
>>> bytes in the specified range are guaranteed not  to fail because of
>>> lack of disk space."
>>>
>>> Hence if you've got a file with shared blocks, a "full fallocate"
>>> must change the extent layout to break the sharing. As such, the
>>> premise of this test is wrong.
>>
>> First, btrfs never meets the posix_fallocate requirement by its COW
>> nature.
>>
>> Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
>> Only when btrfs fallocate a new prealloc extent, next write into it
>> may use the space if they are not shared between different snapshots.
>>
>> Under most case, btrfs fallocate follows the behavior of other non-COW
>> filesystems.
>> Which means, btrfs won't alloc new extent if there is existing extent,
>> not matter if it's shared or not.
>>
>> As a result, fallocate in btrfs only works in a limited use-case, and
>> can easily break posix requirement.
>> Like the following case without snapshots:
>> 1)Fallocate 0~50M
>> 2)Write 0~50M        <- Will not return ENOSPC
>> 3)Write 0~25M        <- COW happens, allocate another 25M,
>>                 may cause ENOSPC.
>> Or even easier with snapshot:
>> 1)Fallocate 0~50M in subvol A
>> 2)Snapshot subvol A into snap B
>> 3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC
>>
>> As in step 3), fallocated 50M is shared, so write will be forced COW.
>>
>> So I'd prefer to make btrfs follows the behavior of other non-COW
>> filesystems, as posix standard doesn't fit well here.
>>>
>>> That's not to say that btrfs has a bug:
>>>
>>>> Btrfs has a bug to always truncate the last page if the fallocate start
>>>> offset is smaller than inode size.
>>>
>>> But it' not clear that this behaviour is actually a bug if it's not
>>> changing the file data.
>> File data is not changed, as btrfs just COW the last tailing page, as
>> reset the last already 0 part.
>>
>> Like the follow ascii arts:
>>
>> 0)Before
>> 0    4K    8K    12K    16K
>> |///////////////////////////|000|
>> |<----------Extent A----------->|
>> The file is 14K size, on disk(to be accurate, btrfs logical address
>> space) it takes 16K, with last 2K padding 0.
>>
>> And all that 16K is in extent A.
>>
>> 1)Fallocate 0~14K
>> In fact, all space in range 0~14K is allocated, so there is no need to
>> reallocate any space.
>>
>> 2)But in btrfs
>> Result will be:
>> 0    4K    8K    12K    16K
>> |///////////////////////////|000|
>> |<------Extent A------->|<--B-->|
>>
>> Btrfs has a wrong judgment, which will always re-padding the last page.
>> Causing a new extent, extent B to be created.
>> Even the contents is the same with original last page.
>>
>> It's OK not to consider it as a bug, at least data is not corrupted.
>> But IMO the btrfs behavior is not needed and need optimization.
>
>> So kernel patch is submitted to btrfs ml:
>> https://patchwork.kernel.org/patch/7284431/
>
> Is this https://patchwork.kernel.org/patch/7283461/ ?  Right?
>
> Thanks,
> Tsutomu

Oh, my fault, 728461 is the right patch...

Thanks for pointing this out,
Qu
>
>>
>> And if fstests is not the proper place, any idea where such "test
>> case" should belong?
>>
>> Thanks,
>> Qu
>>
>>>
>>> Cheers,
>>>
>>> Dave.
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

  reply	other threads:[~2015-09-30  1:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-29  9:34 [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number Qu Wenruo
2015-09-29  9:34 ` Qu Wenruo
2015-09-29  9:55 ` Eryu Guan
2015-09-29 10:16   ` Qu Wenruo
2015-09-29 10:33     ` Eryu Guan
2015-09-29 10:46       ` Qu Wenruo
2015-09-29 10:00 ` Hugo Mills
2015-09-29 10:13   ` Qu Wenruo
2015-09-29 10:24     ` Filipe Manana
2015-09-29 10:48       ` Qu Wenruo
2015-09-30  6:42         ` Duncan
2015-09-29 10:24     ` Hugo Mills
2015-09-29 21:51 ` Dave Chinner
2015-09-30  1:05   ` Qu Wenruo
2015-09-30  1:05     ` Qu Wenruo
2015-09-30  1:45     ` Tsutomu Itoh
2015-09-30  1:49       ` Qu Wenruo [this message]
2015-09-30  1:49         ` Qu Wenruo
2015-09-30  4:20     ` Dave Chinner
2015-09-30  5:48       ` Qu Wenruo
2015-09-30  5:48         ` Qu Wenruo
2015-10-02  8:35   ` Qu Wenruo
2015-10-06  1:31     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=560B3FBE.6080809@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=t-itoh@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.