Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: fdmanana@gmail.com, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
	Martin Doucha <martin.doucha@suse.com>,
	Anand Jain <anand.jain@oracle.com>
Subject: Re: [PATCH] btrfs: Allow btrfs_truncate_block() to fallback to nocow for data space reservation
Date: Thu, 30 Jan 2020 18:36:28 +0800	[thread overview]
Message-ID: <8341b76f-bcbe-f2b9-d8b0-cfcd0006a47c@gmx.com> (raw)
In-Reply-To: <CAL3q7H4ODcwn7LVm=P3BBL7zd3wGRB_Vtr_KNk_2MysNNwgqcQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 9281 bytes --]



On 2020/1/30 下午6:02, Filipe Manana wrote:
> On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> When the data space is exhausted, even the inode has NOCOW attribute,
>> btrfs will still refuse to truncate unaligned range due to ENOSPC.
>>
>> The following script can reproduce it pretty easily:
>>   #!/bin/bash
>>
>>   dev=/dev/test/test
>>   mnt=/mnt/btrfs
>>
>>   umount $dev &> /dev/null
>>   umount $mnt&> /dev/null
>>
>>   mkfs.btrfs -f $dev -b 1G
>>   mount -o nospace_cache $dev $mnt
>>   touch $mnt/foobar
>>   chattr +C $mnt/foobar
>>
>>   xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
>>   xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
>>   sync
>>
>>   xfs_io -c "fpunch 0 2k" $mnt/foobar
>>   umount $mnt
>>
>> Current btrfs will fail at the fpunch part.
>>
>> [CAUSE]
>> Because btrfs_truncate_block() always reserve space without checking the
>> NOCOW attribute.
>>
>> Since the writeback path follows NOCOW bit, we only need to bother the
>> space reservation code in btrfs_truncate_block().
>>
>> [FIX]
>> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
>> reserve data space first, and falls back to NOCOW check only when we
>> don't have enough space.
>>
>> Such always-try-reserve is an optimization introduced in
>> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
>>
>> Since now check_can_nocow() is needed outside of inode.c, also export it
>> and rename it to btrfs_check_can_nocow().
>>
>> Reported-by: Martin Doucha <martin.doucha@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Test case will be submitted to fstests by the reporter.
> 
> Well, this is a sudden change of mind, isn't it? :)
> 
> We had btrfs/172, which you removed very recently, that precisely tested this:
> 
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126

I didn't notice the nodatacow mount option. Super duper big facepalm.

All my bad, especially feel sorry for Anand.

With nodatacow mount option there, that test case in fact makes a lot of
sense.
Sorry again for that.

Anand, mind to resubmit it to generic group?

Thanks,
Qu

> 
> Even though there are several reasons why this can still fail (at
> writeback time), like regular buffered writes through the family of
> write() syscalls can, I think it's perfectly fine to have this
> behaviour.
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> So I think we can just resurrect btrfs/172 now...
> 
>> ---
>>  fs/btrfs/ctree.h |  2 ++
>>  fs/btrfs/file.c  | 10 +++++-----
>>  fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
>>  3 files changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 54efb21c2727..b5639f3461e4 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>>  loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
>>                               struct file *file_out, loff_t pos_out,
>>                               loff_t len, unsigned int remap_flags);
>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> +                         size_t *write_bytes);
>>
>>  /* tree-defrag.c */
>>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 8d47c76b7bd1..8dc084600f4e 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>>         return ret;
>>  }
>>
>> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> -                                   size_t *write_bytes)
>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> +                         size_t *write_bytes)
>>  {
>>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>>         struct btrfs_root *root = inode->root;
>> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                 if (ret < 0) {
>>                         if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>                                                       BTRFS_INODE_PREALLOC)) &&
>> -                           check_can_nocow(BTRFS_I(inode), pos,
>> -                                       &write_bytes) > 0) {
>> +                           btrfs_check_can_nocow(BTRFS_I(inode), pos,
>> +                                                 &write_bytes) > 0) {
>>                                 /*
>>                                  * For nodata cow case, no need to reserve
>>                                  * data space.
>> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>>                  */
>>                 if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>                                               BTRFS_INODE_PREALLOC)) ||
>> -                   check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>> +                   btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>>                         inode_unlock(inode);
>>                         return -EAGAIN;
>>                 }
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 5509c41a4f43..b5ae4bbf1ad4 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         struct extent_state *cached_state = NULL;
>>         struct extent_changeset *data_reserved = NULL;
>>         char *kaddr;
>> +       bool only_release_metadata = false;
>>         u32 blocksize = fs_info->sectorsize;
>>         pgoff_t index = from >> PAGE_SHIFT;
>>         unsigned offset = from & (blocksize - 1);
>>         struct page *page;
>>         gfp_t mask = btrfs_alloc_write_mask(mapping);
>> +       size_t write_bytes = blocksize;
>>         int ret = 0;
>>         u64 block_start;
>>         u64 block_end;
>> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         block_start = round_down(from, blocksize);
>>         block_end = block_start + blocksize - 1;
>>
>> -       ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>> -                                          block_start, blocksize);
>> -       if (ret)
>> +       ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
>> +                                         blocksize);
>> +       if (ret < 0) {
>> +               if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>> +                                             BTRFS_INODE_PREALLOC)) &&
>> +                   btrfs_check_can_nocow(BTRFS_I(inode), block_start,
>> +                                         &write_bytes) > 0) {
>> +                       /* For nocow case, no need to reserve data space. */
>> +                       only_release_metadata = true;
>> +               } else {
>> +                       goto out;
>> +               }
>> +       }
>> +       ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
>> +       if (ret < 0) {
>> +               if (!only_release_metadata)
>> +                       btrfs_free_reserved_data_space(inode, data_reserved,
>> +                                       block_start, blocksize);
>>                 goto out;
>> -
>> +       }
>>  again:
>>         page = find_or_create_page(mapping, index, mask);
>>         if (!page) {
>> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         set_page_dirty(page);
>>         unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
>>
>> +       if (only_release_metadata)
>> +               set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
>> +                               block_end, EXTENT_NORESERVE, NULL, NULL,
>> +                               GFP_NOFS);
>> +
>>  out_unlock:
>> -       if (ret)
>> -               btrfs_delalloc_release_space(inode, data_reserved, block_start,
>> -                                            blocksize, true);
>> +       if (ret) {
>> +               if (!only_release_metadata)
>> +                       btrfs_delalloc_release_space(inode, data_reserved,
>> +                                       block_start, blocksize, true);
>> +               else
>> +                       btrfs_delalloc_release_metadata(BTRFS_I(inode),
>> +                                       blocksize, true);
> 
> I usually find it more intuitive to have it the other way around:
> 
> if (only_release_metadata)
>   ...
> else
>   ...
> 
> E.g., positive case first, negative in the else branch. But that's
> likely too much of a personal preference.
> 
> Thanks.
> 
>> +       }
>>         btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>>         unlock_page(page);
>>         put_page(page);
>> --
>> 2.25.0
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-01-30 10:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-30  5:28 [PATCH] btrfs: Allow btrfs_truncate_block() to fallback to nocow for data space reservation Qu Wenruo
2020-01-30 10:02 ` Filipe Manana
2020-01-30 10:36   ` Qu Wenruo [this message]
2020-01-30 10:46     ` Filipe Manana
2020-01-30 11:02       ` Qu Wenruo
2020-01-31  4:42         ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8341b76f-bcbe-f2b9-d8b0-cfcd0006a47c@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=anand.jain@oracle.com \
    --cc=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin.doucha@suse.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox