From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: fdmanana@gmail.com, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
Martin Doucha <martin.doucha@suse.com>,
Anand Jain <anand.jain@oracle.com>
Subject: Re: [PATCH] btrfs: Allow btrfs_truncate_block() to fallback to nocow for data space reservation
Date: Thu, 30 Jan 2020 18:36:28 +0800 [thread overview]
Message-ID: <8341b76f-bcbe-f2b9-d8b0-cfcd0006a47c@gmx.com> (raw)
In-Reply-To: <CAL3q7H4ODcwn7LVm=P3BBL7zd3wGRB_Vtr_KNk_2MysNNwgqcQ@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 9281 bytes --]
On 2020/1/30 下午6:02, Filipe Manana wrote:
> On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> When the data space is exhausted, even the inode has NOCOW attribute,
>> btrfs will still refuse to truncate unaligned range due to ENOSPC.
>>
>> The following script can reproduce it pretty easily:
>> #!/bin/bash
>>
>> dev=/dev/test/test
>> mnt=/mnt/btrfs
>>
>> umount $dev &> /dev/null
>> umount $mnt&> /dev/null
>>
>> mkfs.btrfs -f $dev -b 1G
>> mount -o nospace_cache $dev $mnt
>> touch $mnt/foobar
>> chattr +C $mnt/foobar
>>
>> xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
>> xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
>> sync
>>
>> xfs_io -c "fpunch 0 2k" $mnt/foobar
>> umount $mnt
>>
>> Current btrfs will fail at the fpunch part.
>>
>> [CAUSE]
>> Because btrfs_truncate_block() always reserve space without checking the
>> NOCOW attribute.
>>
>> Since the writeback path follows NOCOW bit, we only need to bother the
>> space reservation code in btrfs_truncate_block().
>>
>> [FIX]
>> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
>> reserve data space first, and falls back to NOCOW check only when we
>> don't have enough space.
>>
>> Such always-try-reserve is an optimization introduced in
>> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
>>
>> Since now check_can_nocow() is needed outside of inode.c, also export it
>> and rename it to btrfs_check_can_nocow().
>>
>> Reported-by: Martin Doucha <martin.doucha@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Test case will be submitted to fstests by the reporter.
>
> Well, this is a sudden change of mind, isn't it? :)
>
> We had btrfs/172, which you removed very recently, that precisely tested this:
>
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126
I didn't notice the nodatacow mount option. Super duper big facepalm.
All my bad, especially feel sorry for Anand.
With nodatacow mount option there, that test case in fact makes a lot of
sense.
Sorry again for that.
Anand, mind to resubmit it to generic group?
Thanks,
Qu
>
> Even though there are several reasons why this can still fail (at
> writeback time), like regular buffered writes through the family of
> write() syscalls can, I think it's perfectly fine to have this
> behaviour.
>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>
> So I think we can just resurrect btrfs/172 now...
>
>> ---
>> fs/btrfs/ctree.h | 2 ++
>> fs/btrfs/file.c | 10 +++++-----
>> fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
>> 3 files changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 54efb21c2727..b5639f3461e4 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>> loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
>> struct file *file_out, loff_t pos_out,
>> loff_t len, unsigned int remap_flags);
>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> + size_t *write_bytes);
>>
>> /* tree-defrag.c */
>> int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 8d47c76b7bd1..8dc084600f4e 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>> return ret;
>> }
>>
>> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> - size_t *write_bytes)
>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> + size_t *write_bytes)
>> {
>> struct btrfs_fs_info *fs_info = inode->root->fs_info;
>> struct btrfs_root *root = inode->root;
>> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>> if (ret < 0) {
>> if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>> BTRFS_INODE_PREALLOC)) &&
>> - check_can_nocow(BTRFS_I(inode), pos,
>> - &write_bytes) > 0) {
>> + btrfs_check_can_nocow(BTRFS_I(inode), pos,
>> + &write_bytes) > 0) {
>> /*
>> * For nodata cow case, no need to reserve
>> * data space.
>> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>> */
>> if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>> BTRFS_INODE_PREALLOC)) ||
>> - check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>> + btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>> inode_unlock(inode);
>> return -EAGAIN;
>> }
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 5509c41a4f43..b5ae4bbf1ad4 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>> struct extent_state *cached_state = NULL;
>> struct extent_changeset *data_reserved = NULL;
>> char *kaddr;
>> + bool only_release_metadata = false;
>> u32 blocksize = fs_info->sectorsize;
>> pgoff_t index = from >> PAGE_SHIFT;
>> unsigned offset = from & (blocksize - 1);
>> struct page *page;
>> gfp_t mask = btrfs_alloc_write_mask(mapping);
>> + size_t write_bytes = blocksize;
>> int ret = 0;
>> u64 block_start;
>> u64 block_end;
>> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>> block_start = round_down(from, blocksize);
>> block_end = block_start + blocksize - 1;
>>
>> - ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>> - block_start, blocksize);
>> - if (ret)
>> + ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
>> + blocksize);
>> + if (ret < 0) {
>> + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>> + BTRFS_INODE_PREALLOC)) &&
>> + btrfs_check_can_nocow(BTRFS_I(inode), block_start,
>> + &write_bytes) > 0) {
>> + /* For nocow case, no need to reserve data space. */
>> + only_release_metadata = true;
>> + } else {
>> + goto out;
>> + }
>> + }
>> + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
>> + if (ret < 0) {
>> + if (!only_release_metadata)
>> + btrfs_free_reserved_data_space(inode, data_reserved,
>> + block_start, blocksize);
>> goto out;
>> -
>> + }
>> again:
>> page = find_or_create_page(mapping, index, mask);
>> if (!page) {
>> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>> set_page_dirty(page);
>> unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
>>
>> + if (only_release_metadata)
>> + set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
>> + block_end, EXTENT_NORESERVE, NULL, NULL,
>> + GFP_NOFS);
>> +
>> out_unlock:
>> - if (ret)
>> - btrfs_delalloc_release_space(inode, data_reserved, block_start,
>> - blocksize, true);
>> + if (ret) {
>> + if (!only_release_metadata)
>> + btrfs_delalloc_release_space(inode, data_reserved,
>> + block_start, blocksize, true);
>> + else
>> + btrfs_delalloc_release_metadata(BTRFS_I(inode),
>> + blocksize, true);
>
> I usually find it more intuitive to have it the other way around:
>
> if (only_release_metadata)
> ...
> else
> ...
>
> E.g., positive case first, negative in the else branch. But that's
> likely too much of a personal preference.
>
> Thanks.
>
>> + }
>> btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>> unlock_page(page);
>> put_page(page);
>> --
>> 2.25.0
>>
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2020-01-30 10:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-30 5:28 [PATCH] btrfs: Allow btrfs_truncate_block() to fallback to nocow for data space reservation Qu Wenruo
2020-01-30 10:02 ` Filipe Manana
2020-01-30 10:36 ` Qu Wenruo [this message]
2020-01-30 10:46 ` Filipe Manana
2020-01-30 11:02 ` Qu Wenruo
2020-01-31 4:42 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8341b76f-bcbe-f2b9-d8b0-cfcd0006a47c@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=anand.jain@oracle.com \
--cc=fdmanana@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=martin.doucha@suse.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox