From: Omar Sandoval <osandov@fb.com>
To: <fdmanana@kernel.org>
Cc: <linux-btrfs@vger.kernel.org>, Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH] Btrfs: fix quick exhaustion of the system array in the superblock
Date: Tue, 21 Jul 2015 16:53:34 -0700 [thread overview]
Message-ID: <20150721235334.GA8113@huxley.DHCP.TheFacebook.com> (raw)
In-Reply-To: <1437400580-30525-1-git-send-email-fdmanana@kernel.org>
On Mon, Jul 20, 2015 at 02:56:20PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> Omar reported that after commit 4fbcdf669454 ("Btrfs: fix -ENOSPC when
> finishing block group creation"), introduced in 4.2-rc1, the following
> test was failing due to exhaustion of the system array in the superblock:
>
> #!/bin/bash
>
> truncate -s 100T big.img
> mkfs.btrfs big.img
> mount -o loop big.img /mnt/loop
>
> num=5
> sz=10T
> for ((i = 0; i < $num; i++)); do
> echo fallocate $i $sz
> fallocate -l $sz /mnt/loop/testfile$i
> done
> btrfs filesystem sync /mnt/loop
>
> for ((i = 0; i < $num; i++)); do
> echo rm $i
> rm /mnt/loop/testfile$i
> btrfs filesystem sync /mnt/loop
> done
> umount /mnt/loop
>
> This made btrfs_add_system_chunk() fail with -EFBIG due to excessive
> allocation of system block groups. This happened because the test creates
> a large number of data block groups per transaction and when committing
> the transaction we start the writeout of the block group caches for all
> the new new (dirty) block groups, which results in pre-allocating space
> for each block group's free space cache using the same transaction handle.
> That in turn often leads to creation of more block groups, and all get
> attached to the new_bgs list of the same transaction handle to the point
> of getting a list with over 1500 elements, and creation of new block groups
> leads to the need of reserving space in the chunk block reserve and often
> creating a new system block group too.
>
> So that made us quickly exhaust the chunk block reserve/system space info,
> because as of the commit mentioned before, we do reserve space for each
> new block group in the chunk block reserve, unlike before where we would
> not and would at most allocate one new system block group and therefore
> would only ensure that there was enough space in the system space info to
> allocate 1 new block group even if we ended up allocating thousands of
> new block groups using the same transaction handle. That worked most of
> the time because the computed required space at check_system_chunk() is
> very pessimistic (assumes a chunk tree height of BTRFS_MAX_LEVEL/8 and
> that all nodes/leafs in a path will be COWed and split) and since the
> updates to the chunk tree all happen at btrfs_create_pending_block_groups
> it is unlikely that a path needs to be COWed more than once (unless
> writepages() for the btree inode is called by mm in between) and that
> compensated for the need of creating any new nodes/leads in the chunk
> tree.
>
> So fix this by ensuring we don't accumulate a too large list of new block
> groups in a transaction's handles new_bgs list, inserting/updating the
> chunk tree for all accumulated new block groups and releasing the unused
> space from the chunk block reserve whenever the list becomes sufficiently
> large. This is a generic solution even though the problem currently can
> only happen when starting the writeout of the free space caches for all
> dirty block groups (btrfs_start_dirty_block_groups()).
>
> Reported-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Thanks a lot for taking a look.
Tested-by: Omar Sandoval <osandov@fb.com>
> ---
> fs/btrfs/extent-tree.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 171312d..07204bf 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4227,6 +4227,24 @@ out:
> space_info->chunk_alloc = 0;
> spin_unlock(&space_info->lock);
> mutex_unlock(&fs_info->chunk_mutex);
> + /*
> + * When we allocate a new chunk we reserve space in the chunk block
> + * reserve to make sure we can COW nodes/leafs in the chunk tree or
> + * add new nodes/leafs to it if we end up needing to do it when
> + * inserting the chunk item and updating device items as part of the
> + * second phase of chunk allocation, performed by
> + * btrfs_finish_chunk_alloc(). So make sure we don't accumulate a
> + * large number of new block groups to create in our transaction
> + * handle's new_bgs list to avoid exhausting the chunk block reserve
> + * in extreme cases - like having a single transaction create many new
> + * block groups when starting to write out the free space caches of all
> + * the block groups that were made dirty during the lifetime of the
> + * transaction.
> + */
> + if (trans->chunk_bytes_reserved >= (2 * 1024 * 1024ull)) {
> + btrfs_create_pending_block_groups(trans, trans->root);
> + btrfs_trans_release_chunk_metadata(trans);
> + }
> return ret;
> }
>
> --
> 2.1.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Omar
next prev parent reply other threads:[~2015-07-21 23:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-20 13:56 [PATCH] Btrfs: fix quick exhaustion of the system array in the superblock fdmanana
2015-07-21 23:53 ` Omar Sandoval [this message]
2015-12-13 10:29 ` Alex Lyakas
2015-12-13 15:45 ` Filipe Manana
2015-12-13 16:15 ` Alex Lyakas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150721235334.GA8113@huxley.DHCP.TheFacebook.com \
--to=osandov@fb.com \
--cc=fdmanana@kernel.org \
--cc=fdmanana@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).