From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from cn.fujitsu.com ([59.151.112.132]:16878 "EHLO
	heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org
	with ESMTP id S1752159AbbI3BFT (ORCPT
	<rfc822;fstests@vger.kernel.org>); Tue, 29 Sep 2015 21:05:19 -0400
Subject: Re: [PATCH v2] fstests: generic: Check if a bull fallocate will
 change extent number
References: <1443519264-19184-1-git-send-email-quwenruo@cn.fujitsu.com>
 <20150929215135.GZ3902@dastard>
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Message-ID: <560B354B.9050400@cn.fujitsu.com>
Date: Wed, 30 Sep 2015 09:05:15 +0800
MIME-Version: 1.0
In-Reply-To: <20150929215135.GZ3902@dastard>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: fstests-owner@vger.kernel.org
To: Dave Chinner <david@fromorbit.com>
Cc: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
List-ID: <fstests@vger.kernel.org>


Dave Chinner wrote on 2015/09/30 07:51 +1000:
> On Tue, Sep 29, 2015 at 05:34:24PM +0800, Qu Wenruo wrote:
>> Normally, a bull fallocate call on a fully written and synced file
>> should not add an extent.
>
> Why not? Filesystems can do whatever they want with extents during
> a fallocate call. e.g. if the blocks are shared, then fallocate
> might break the block sharing so future overwrites don't get
> ENOSPC. This is a requirement set down by posix_fallocate(3)
>
> "After a successful call to posix_fallocate(), subsequent writes to
> bytes in the specified range are guaranteed not  to fail because of
> lack of disk space."
>
> Hence if you've got a file with shared blocks, a "full fallocate"
> must change the extent layout to break the sharing. As such, the
> premise of this test is wrong.

First, btrfs never meets the posix_fallocate requirement by its COW nature.

Btrfs fallocate can only ensure at most next write will not cause ENOSPC.
Only when btrfs fallocate a new prealloc extent, next write into it may 
use the space if they are not shared between different snapshots.

Under most case, btrfs fallocate follows the behavior of other non-COW 
filesystems.
Which means, btrfs won't alloc new extent if there is existing extent, 
not matter if it's shared or not.

As a result, fallocate in btrfs only works in a limited use-case, and 
can easily break posix requirement.
Like the following case without snapshots:
1)Fallocate 0~50M
2)Write 0~50M		<- Will not return ENOSPC
3)Write 0~25M		<- COW happens, allocate another 25M,
			   may cause ENOSPC.
Or even easier with snapshot:
1)Fallocate 0~50M in subvol A
2)Snapshot subvol A into snap B
3)Write 0~25M in subvol A *OR* snap B <- COW happens, may cause ENOSPC

As in step 3), fallocated 50M is shared, so write will be forced COW.

So I'd prefer to make btrfs follows the behavior of other non-COW 
filesystems, as posix standard doesn't fit well here.
>
> That's not to say that btrfs has a bug:
>
>> Btrfs has a bug to always truncate the last page if the fallocate start
>> offset is smaller than inode size.
>
> But it' not clear that this behaviour is actually a bug if it's not
> changing the file data.
File data is not changed, as btrfs just COW the last tailing page, as 
reset the last already 0 part.

Like the follow ascii arts:

0)Before
0	4K	8K	12K	16K
|///////////////////////////|000|
|<----------Extent A----------->|
The file is 14K size, on disk(to be accurate, btrfs logical address 
space) it takes 16K, with last 2K padding 0.

And all that 16K is in extent A.

1)Fallocate 0~14K
In fact, all space in range 0~14K is allocated, so there is no need to 
reallocate any space.

2)But in btrfs
Result will be:
0	4K	8K	12K	16K
|///////////////////////////|000|
|<------Extent A------->|<--B-->|

Btrfs has a wrong judgment, which will always re-padding the last page.
Causing a new extent, extent B to be created.
Even the contents is the same with original last page.

It's OK not to consider it as a bug, at least data is not corrupted.
But IMO the btrfs behavior is not needed and need optimization.
So kernel patch is submitted to btrfs ml:
https://patchwork.kernel.org/patch/7284431/

And if fstests is not the proper place, any idea where such "test case" 
should belong?

Thanks,
Qu

>
> Cheers,
>
> Dave.
>