linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
To: Josef Bacik <jbacik@fb.com>, <linux-btrfs@vger.kernel.org>
Cc: <dsterba@suse.cz>, <holger@applied-asynchrony.com>
Subject: Re: [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely
Date: Thu, 21 Jul 2016 09:18:03 +0800	[thread overview]
Message-ID: <579022CB.5040002@cn.fujitsu.com> (raw)
In-Reply-To: <8e602903-3e20-fa6c-3d00-4c22519d66e1@fb.com>

hello,

On 07/20/2016 09:35 PM, Josef Bacik wrote:
> On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:
>> This patch can fix some false ENOSPC errors, below test script can
>> reproduce one false ENOSPC error:
>>     #!/bin/bash
>>     dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
>>     dev=$(losetup --show -f fs.img)
>>     mkfs.btrfs -f -M $dev
>>     mkdir /tmp/mntpoint
>>     mount $dev /tmp/mntpoint
>>     cd /tmp/mntpoint
>>     xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
>>
>> Above script will fail for ENOSPC reason, but indeed fs still has free
>> space to satisfy this request. Please see call graph:
>> btrfs_fallocate()
>> |-> btrfs_alloc_data_chunk_ondemand()
>> |   bytes_may_use += 64M
>> |-> btrfs_prealloc_file_range()
>>     |-> btrfs_reserve_extent()
>>         |-> btrfs_add_reserved_bytes()
>>         |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
>>         |   change bytes_may_use, and bytes_reserved += 64M. Now
>>         |   bytes_may_use + bytes_reserved == 128M, which is greater
>>         |   than btrfs_space_info's total_bytes, false enospc occurs.
>>         |   Note, the bytes_may_use decrease operation will done in
>>         |   end of btrfs_fallocate(), which is too late.
>>
>> Here is another simple case for buffered write:
>>                     CPU 1              |              CPU 2
>>                                        |
>> |-> cow_file_range()                   |-> __btrfs_buffered_write()
>>     |-> btrfs_reserve_extent()         |   |
>>     |                                  |   |
>>     |                                  |   |
>>     |    .....                         |   |-> 
>> btrfs_check_data_free_space()
>>     |                                  |
>>     |                                  |
>>     |-> extent_clear_unlock_delalloc() |
>>
>> In CPU 1, btrfs_reserve_extent()->find_free_extent()->
>> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
>> operation will be delayed to be done in extent_clear_unlock_delalloc().
>> Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
>> btrfs_check_data_free_space() tries to reserve 100MB data space.
>> If
>>     100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
>>         data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
>>         data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
>> btrfs_check_data_free_space() will try to allcate new data chunk or call
>> btrfs_start_delalloc_roots(), or commit current transaction inorder to
>> reserve some free space, obviously a lot of work. But indeed it's not
>> necessary as long as decreasing bytes_may_use timely, we still have
>> free space, decreasing 128M from bytes_may_use.
>>
>> To fix this issue, this patch chooses to update bytes_may_use for both
>> data and metadata in btrfs_add_reserved_bytes(). For compress path, real
>> extent length may not be equal to file content length, so introduce a
>> ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
>> btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
>> file content length. Then compress path can update bytes_may_use
>> correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, 
>> RESERVE_ALLOC
>> and RESERVE_FREE.
>>
>> For inode marked as NODATACOW or extent marked as PREALLOC, we can
>> directly call btrfs_free_reserved_data_space() to adjust bytes_may_use.
>>
>> Meanwhile __btrfs_prealloc_file_range() will call
>> btrfs_free_reserved_data_space() internally for both sucessful and 
>> failed
>> path, btrfs_prealloc_file_range()'s callers does not need to call
>> btrfs_free_reserved_data_space() any more.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>  fs/btrfs/ctree.h       |  2 +-
>>  fs/btrfs/extent-tree.c | 56 
>> +++++++++++++++++---------------------------------
>>  fs/btrfs/file.c        | 26 +++++++++++++----------
>>  fs/btrfs/inode-map.c   |  3 +--
>>  fs/btrfs/inode.c       | 37 ++++++++++++++++++++++++---------
>>  fs/btrfs/relocation.c  | 11 ++++++++--
>>  6 files changed, 72 insertions(+), 63 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 4274a7b..7eb2913 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct 
>> btrfs_trans_handle *trans,
>>                     struct btrfs_root *root,
>>                     u64 root_objectid, u64 owner, u64 offset,
>>                     struct btrfs_key *ins);
>> -int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
>> +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 
>> num_bytes,
>>               u64 min_alloc_size, u64 empty_size, u64 hint_byte,
>>               struct btrfs_key *ins, int is_data, int delalloc);
>>  int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct 
>> btrfs_root *root,
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 8eaac39..5447973 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -60,21 +60,6 @@ enum {
>>      CHUNK_ALLOC_FORCE = 2,
>>  };
>>
>> -/*
>> - * Control how reservations are dealt with.
>> - *
>> - * RESERVE_FREE - freeing a reservation.
>> - * RESERVE_ALLOC - allocating space and we need to update 
>> bytes_may_use for
>> - *   ENOSPC accounting
>> - * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update
>> - *   bytes_may_use as the ENOSPC accounting is done elsewhere
>> - */
>> -enum {
>> -    RESERVE_FREE = 0,
>> -    RESERVE_ALLOC = 1,
>> -    RESERVE_ALLOC_NO_ACCOUNT = 2,
>> -};
>> -
>>  static int update_block_group(struct btrfs_trans_handle *trans,
>>                    struct btrfs_root *root, u64 bytenr,
>>                    u64 num_bytes, int alloc);
>> @@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, 
>> int level,
>>  static void dump_space_info(struct btrfs_space_info *info, u64 bytes,
>>                  int dump_block_groups);
>>  static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache 
>> *cache,
>> -                    u64 num_bytes, int reserve, int delalloc);
>> +                    u64 ram_bytes, u64 num_bytes, int delalloc);
>>  static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache 
>> *cache,
>>                       u64 num_bytes, int delalloc);
>>  static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv,
>> @@ -3491,7 +3476,6 @@ again:
>>          dcs = BTRFS_DC_SETUP;
>>      else if (ret == -ENOSPC)
>>          set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags);
>> -    btrfs_free_reserved_data_space(inode, 0, num_pages);
>>
>>  out_put:
>>      iput(inode);
>> @@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct 
>> btrfs_block_group_cache *bg)
>>  /**
>>   * btrfs_add_reserved_bytes - update the block_group and space info 
>> counters
>>   * @cache:    The cache we are manipulating
>> + * @ram_bytes:  The number of bytes of file content, and will be 
>> same to
>> + *              @num_bytes except for the compress path.
>>   * @num_bytes:    The number of bytes in question
>> - * @reserve:    One of the reservation enums
>>   * @delalloc:   The blocks are allocated for the delalloc write
>>   *
>>   * This is called by the allocator when it reserves space. Metadata
>> @@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct 
>> btrfs_block_group_cache *bg)
>>   * succeeds.
>>   */
>>  static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache 
>> *cache,
>> -                    u64 num_bytes, int reserve, int delalloc)
>> +                    u64 ram_bytes, u64 num_bytes, int delalloc)
>>  {
>>      struct btrfs_space_info *space_info = cache->space_info;
>>      int ret = 0;
>> @@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct 
>> btrfs_block_group_cache *cache,
>>      } else {
>>          cache->reserved += num_bytes;
>>          space_info->bytes_reserved += num_bytes;
>> -        if (reserve == RESERVE_ALLOC) {
>> -            trace_btrfs_space_reservation(cache->fs_info,
>> -                    "space_info", space_info->flags,
>> -                    num_bytes, 0);
>> -            space_info->bytes_may_use -= num_bytes;
>> -        }
>>
>> +        trace_btrfs_space_reservation(cache->fs_info,
>> +                "space_info", space_info->flags,
>> +                num_bytes, 0);
>
> This needs to be ram_bytes to keep the accounting consistent for tools 
> that use these tracepoints.  Thanks,
OK, I'll fix this issue in later version.

Regards,
Xiaoguang Wang
>
> Josef
>
>




  reply	other threads:[~2016-07-21  1:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-20  5:56 [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Wang Xiaoguang
2016-07-20  5:56 ` [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() Wang Xiaoguang
2016-07-20 13:18   ` Josef Bacik
2016-07-21  1:49     ` Wang Xiaoguang
2016-07-21 13:05       ` Josef Bacik
2016-07-20  5:56 ` [PATCH 2/4] btrfs: divide btrfs_update_reserved_bytes() into two functions Wang Xiaoguang
2016-07-20 13:21   ` Josef Bacik
2016-07-20  5:56 ` [PATCH 3/4] btrfs: introduce new EXTENT_CLEAR_DATA_RESV flag Wang Xiaoguang
2016-07-20 13:22   ` Josef Bacik
2016-07-21  1:15     ` Wang Xiaoguang
2016-07-20  5:56 ` [PATCH 4/4] btrfs: update btrfs_space_info's bytes_may_use timely Wang Xiaoguang
2016-07-20 13:35   ` Josef Bacik
2016-07-21  1:18     ` Wang Xiaoguang [this message]
2016-07-20  8:46 ` [PATCH 0/4] update bytes_may_use timely to avoid false ENOSPC issue Holger Hoffstätte
2016-07-21  1:51   ` Wang Xiaoguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=579022CB.5040002@cn.fujitsu.com \
    --to=wangxg.fnst@cn.fujitsu.com \
    --cc=dsterba@suse.cz \
    --cc=holger@applied-asynchrony.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).