From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([59.151.112.132]:8757 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1754559AbbI3I0S convert rfc822-to-8bit (ORCPT ); Wed, 30 Sep 2015 04:26:18 -0400 From: Zhao Lei To: CC: References: <585889cfca0895674926445c0a8caa84f8fc6184.1443534660.git.zhaolei@cn.fujitsu.com> <1a179aa4a1d534e4b31d0978e72a7a391f9b408d.1443534660.git.zhaolei@cn.fujitsu.com> <00db01d0fb37$5b821850$128648f0$@cn.fujitsu.com> In-Reply-To: Subject: RE: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg Date: Wed, 30 Sep 2015 16:26:15 +0800 Message-ID: <00ff01d0fb59$acaa56d0$05ff0470$@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hi, Filipe Manana > -----Original Message----- > From: Filipe Manana [mailto:fdmanana@gmail.com] > Sent: Wednesday, September 30, 2015 3:41 PM > To: Zhao Lei > Cc: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg > > On Wed, Sep 30, 2015 at 5:20 AM, Zhao Lei wrote: > > Hi, Filipe Manana > > > > Thanks for reviewing. > > > >> -----Original Message----- > >> From: Filipe Manana [mailto:fdmanana@gmail.com] > >> Sent: Tuesday, September 29, 2015 11:48 PM > >> To: Zhao Lei > >> Cc: linux-btrfs@vger.kernel.org > >> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by > >> balance bg > >> > >> On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei wrote: > >> > Reproduce: > >> > (In integration-4.3 branch) > >> > > >> > TEST_DEV=(/dev/vdg /dev/vdh) > >> > TEST_DIR=/mnt/tmp > >> > > >> > umount "$TEST_DEV" >/dev/null > >> > mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}" > >> > > >> > mount -o nospace_cache "$TEST_DEV" "$TEST_DIR" > >> > btrfs balance start -dusage=0 $TEST_DIR btrfs filesystem usage > >> > $TEST_DIR > >> > > >> > dd if=/dev/zero of="$TEST_DIR"/file count=100 btrfs filesystem > >> > usage $TEST_DIR > >> > > >> > Result: > >> > We can see "no data chunk" in first "btrfs filesystem usage": > >> > # btrfs filesystem usage $TEST_DIR > >> > Overall: > >> > ... > >> > Metadata,single: Size:8.00MiB, Used:0.00B > >> > /dev/vdg 8.00MiB > >> > Metadata,RAID1: Size:122.88MiB, Used:112.00KiB > >> > /dev/vdg 122.88MiB > >> > /dev/vdh 122.88MiB > >> > System,single: Size:4.00MiB, Used:0.00B > >> > /dev/vdg 4.00MiB > >> > System,RAID1: Size:8.00MiB, Used:16.00KiB > >> > /dev/vdg 8.00MiB > >> > /dev/vdh 8.00MiB > >> > Unallocated: > >> > /dev/vdg 1.06GiB > >> > /dev/vdh 1.07GiB > >> > > >> > And "data chunks changed from raid1 to single" in second "btrfs > >> > filesystem usage": > >> > # btrfs filesystem usage $TEST_DIR > >> > Overall: > >> > ... > >> > Data,single: Size:256.00MiB, Used:0.00B > >> > /dev/vdh 256.00MiB > >> > Metadata,single: Size:8.00MiB, Used:0.00B > >> > /dev/vdg 8.00MiB > >> > Metadata,RAID1: Size:122.88MiB, Used:112.00KiB > >> > /dev/vdg 122.88MiB > >> > /dev/vdh 122.88MiB > >> > System,single: Size:4.00MiB, Used:0.00B > >> > /dev/vdg 4.00MiB > >> > System,RAID1: Size:8.00MiB, Used:16.00KiB > >> > /dev/vdg 8.00MiB > >> > /dev/vdh 8.00MiB > >> > Unallocated: > >> > /dev/vdg 1.06GiB > >> > /dev/vdh 841.92MiB > >> > > >> > Reason: > >> > btrfs balance delete last data chunk in case of no data in the > >> > filesystem, then we can see "no data chunk" by "fi usage" > >> > command. > >> > > >> > And when we do write operation to fs, the only available data > >> > profile is 0x0, result is all new chunks are allocated single type. > >> > > >> > Fix: > >> > Allocate a data chunk explicitly in balance operation, to reserve > >> > at least one data chunk in the filesystem. > >> > >> Allocate a data chunk explicitly to ensure we don't lose the raid profile for > data. > >> > > > > Thanks, will change in v2. > > > >> > > >> > Test: > >> > Test by above script, and confirmed the logic by debug output. > >> > > >> > Signed-off-by: Zhao Lei > >> > --- > >> > fs/btrfs/volumes.c | 19 +++++++++++++++++++ > >> > 1 file changed, 19 insertions(+) > >> > > >> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index > >> > 6fc73586..3d5e41e 100644 > >> > --- a/fs/btrfs/volumes.c > >> > +++ b/fs/btrfs/volumes.c > >> > @@ -3277,6 +3277,7 @@ static int __btrfs_balance(struct > >> > btrfs_fs_info > >> *fs_info) > >> > u64 limit_data = bctl->data.limit; > >> > u64 limit_meta = bctl->meta.limit; > >> > u64 limit_sys = bctl->sys.limit; > >> > + int chunk_reserved = 0; > >> > > >> > /* step one make some room on all the devices */ > >> > devices = &fs_info->fs_devices->devices; @@ -3387,6 > >> > +3388,24 @@ again: > >> > goto loop; > >> > } > >> > > >> > + if (!chunk_reserved) { > >> > + trans = btrfs_start_transaction(chunk_root, > 0); > >> > + if (IS_ERR(trans)) { > >> > + > >> mutex_unlock(&fs_info->delete_unused_bgs_mutex); > >> > + ret = PTR_ERR(trans); > >> > + goto error; > >> > + } > >> > + > >> > + ret = btrfs_force_chunk_alloc(trans, > >> > + chunk_root, 1); > >> > >> Can we please use the symbol BTRFS_BLOCK_GROUP_DATA instead of 1? > >> > > Thanks, will change in v2. > > > > > >> > + if (ret < 0) { > >> > + > >> mutex_unlock(&fs_info->delete_unused_bgs_mutex); > >> > + goto error; > >> > + } > >> > + > >> > + btrfs_end_transaction(trans, chunk_root); > >> > + chunk_reserved = 1; > >> > + } > >> > >> Can we do this logic only if the block group is a data one? I.e. no > >> point allocating a data block group if our balance only touches > >> metadata ones (due to some filter for e.g.). > >> > > Thanks for point out it, will change in v2. > > I find it somewhat awkward that we always allocate a new data block group no > matter what. Some balance operations that used to succeed in the past may > now fail with -ENOSPC for example... > > How about making this simpler and not so special for data block groups, like the > following (compile tested only): > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 644e070..067b1eb > 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2774,6 +2774,8 @@ static int btrfs_relocate_chunk(struct btrfs_root > *root, u64 chunk_offset) > struct btrfs_root *extent_root; > struct btrfs_trans_handle *trans; > int ret; > + struct btrfs_block_group_cache *bg; > + bool remove = true; > > root = root->fs_info->chunk_root; > extent_root = root->fs_info->extent_root; @@ -2803,6 +2805,23 > @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset) > if (ret) > return ret; > > + bg = btrfs_lookup_block_group(root->fs_info, chunk_offset); > + ASSERT(bg); > + down_read(&bg->space_info->groups_sem); > + /* > + * Do not remove the last block group of a kind to prevent losing > + * raid profile information for future chunk allocations. > + */ > + if (bg->list.next == bg->list.prev) > + remove = false; > + up_read(&bg->space_info->groups_sem); > + if (!remove) > + btrfs_dec_block_group_ro(extent_root, bg); > + btrfs_put_block_group(bg); > + > + if (!remove) > + return 0; > + > trans = btrfs_start_transaction(root, 0); > if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > Thanks for your detailed review! Reason of creating new bg is: Balance operation may be used to change raid-profile of the filesystem. If we want to change filesystem form raid1 to raid5, we need: 1: delete all raid1 bgs 2: move all data to raid5 bg if data exist in filesystem 3: reserve only one blank raid5 bg if no data in filesystem If we use similar operation as patch 1/2(not delete lattest blockgroup), we'll have following problem: 1: the old raid1 blockgroups are not all-deleted 2: if no data in filesystem, the profile will not changed from raid1 to raid5. I understand your worry of "NO_SPACE" when create additional bg, and I also considered it in making patch, but I choose to use this way because: 1: balance operation had check free space before operation (In front of __btrfs_balance) 2: for filesystem with data, we have to create target-chunk in balance operation, this patch only make "creating-chunk" earlier, and the created chunk will be used to save data in balance operation, and reduce one chunk-allocate in balance operation. 3: for a blank filesystem, it is necessary to create a blank chunk with of target raid profile. the old code hadn't create it, and this patch created. It is right operation even if cause no-space. Actually, create a chunk in empty filesystem rarely cause no-space, plus we have free-space check in 1. 4: 1~3 ensure this patch rarely cause additional no-space problem in logic. And if it really caused additional no-space(if out of my consideration), We can change code of 1 to avoid it. Thanks Zhaolei > (also at https://friendpaste.com/5IeAIIzBv3oKhureKfvjwm) > > thanks > > > > > >> Also, can't this be ineffective when the data block group we allocate > >> ends up with a key in the chunk_root that is lower than the key we > >> found in the current iteration? I.e., key found in current iteration > >> has object id B, we allocate a new block group which gets a key with > >> object id A, where A < B and the next iteration of our loop sees key > >> A, deletes the respective block group A if it's empty. In the end we > >> have no data block groups, assuming that before A there are no other > non-empty data block groups. > >> Your example works because there's no free space before the offset > >> where the chunk starts in the device. > >> > > New chunk will always use increased logic address, even if there are > > "hole" before, so A's logic address will always >B. > > And current code of balance also use this feature to avoid balance > > new-created chunks(which was created by balance operation itself). > > > > Thanks > > Zhaolei > > > >> thanks > >> > >> > + > >> > ret = btrfs_relocate_chunk(chunk_root, > >> > found_key.offset); > >> > mutex_unlock(&fs_info->delete_unused_bgs_mutex); > >> > -- > >> > 1.8.5.1 > >> > > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > >> > in the body of a message to majordomo@vger.kernel.org More > >> > majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > >> > >> -- > >> Filipe David Manana, > >> > >> "Reasonable men adapt themselves to the world. > >> Unreasonable men adapt the world to themselves. > >> That's why all progress depends on unreasonable men." > > > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men."