From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([222.73.24.84]:19683 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1752955Ab2IMDgE (ORCPT ); Wed, 12 Sep 2012 23:36:04 -0400 Message-ID: <5051549B.7070206@cn.fujitsu.com> Date: Thu, 13 Sep 2012 11:35:55 +0800 From: Miao Xie Reply-To: miaox@cn.fujitsu.com MIME-Version: 1.0 To: Josef Bacik CC: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] Btrfs: delay block group item insertion References: <1347473053-3158-1-git-send-email-jbacik@fusionio.com> In-Reply-To: <1347473053-3158-1-git-send-email-jbacik@fusionio.com> Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On wed, 12 Sep 2012 14:04:13 -0400, Josef Bacik wrote: > So we have lots of places where we try to preallocate chunks in order to > make sure we have enough space as we make our allocations. This has > historically meant that we're constantly tweaking when we should allocate a > new chunk, and historically we have gotten this horribly wrong so we way > over allocate either metadata or data. To try and keep this from happening > we are going to make it so that the block group item insertion is done out > of band at the end of a transaction. This will allow us to create chunks > even if we are trying to make an allocation for the extent tree. With this > patch my enospc tests run faster (didn't expect this) and more efficiently > use the disk space (this is what I wanted). Thanks, This patch also fixes a deadlock problem that happened when we add two or more small devices(< 4G) into a seed fs(the profile of metadata is RAID1), and a enospc problem when we add a small device (< 256M) into a big empty seed fs(> 60G). (My fix patch which is similar to this one is on the way, I'm a bit slow :) ) > @@ -1400,6 +1407,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, > */ > cur_trans->delayed_refs.flushing = 1; > > + if (!list_empty(&trans->new_bgs)) > + btrfs_create_pending_block_groups(trans, root); > + > ret = btrfs_run_delayed_refs(trans, root, 0); > if (ret) > goto cleanup_transaction; I think we can not make sure we won't allocate new chunks when we create the pending snapshots and write out the space cache and inode cache, so we should check ->new_bgs and call btrfs_create_pending_block_groups() when committing the cowonly tree roots. And beside that, We'd better add a BUG_ON() after we update the root tree to make sure there is no pending block group item left in the list. Thanks Miao