From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 906B0C43381 for ; Fri, 15 Mar 2019 19:00:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6B3132064A for ; Fri, 15 Mar 2019 19:00:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726689AbfCOTAH convert rfc822-to-8bit (ORCPT ); Fri, 15 Mar 2019 15:00:07 -0400 Received: from james.kirk.hungrycats.org ([174.142.39.145]:46484 "EHLO james.kirk.hungrycats.org" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1725922AbfCOTAH (ORCPT ); Fri, 15 Mar 2019 15:00:07 -0400 Received: by james.kirk.hungrycats.org (Postfix, from userid 1002) id 4FE45263BED; Fri, 15 Mar 2019 14:59:46 -0400 (EDT) Date: Fri, 15 Mar 2019 14:59:46 -0400 From: Zygo Blaxell To: Jakub =?iso-8859-1?Q?Hus=E1k?= Cc: linux-btrfs@vger.kernel.org Subject: Re: Balancing raid5 after adding another disk does not move/use any data on it Message-ID: <20190315185946.GK9995@hungrycats.org> References: <7a713010-5db6-2627-2593-8e13092868b1@husak.pro> <20190315180123.GJ9995@hungrycats.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Fri, Mar 15, 2019 at 07:42:21PM +0100, Jakub Husák wrote: > Thanks for explanation! actually when I moved forward with the rebalancing > the fourth disk started to receive some data. > > BTW, I was hoping some filter like '-dstripes=1..3' existed and it is! > Wouldn't it deserve some documentation? :) It has some, from the man page for btrfs-balance: stripes= Balance only block groups which have the given number of stripes. The parameter is a range specified as start..end. Makes sense for block group profiles that utilize striping, ie. RAID0/10/5/6. The range minimum and maximum are inclusive. There are probably some wikis that could benefit from a sentence or two explaining when you'd use this option. Or a table of which RAID profiles must be balanced after a device add (always raid0, raid5, raid6, sometimes raid1 and raid10) and which don't (never single, dup, sometimes raid1 and raid10). > Also thanks to Noah Massey for caring! > > Cheers > > > On 15. 03. 19 19:01, Zygo Blaxell wrote: > > On Wed, Mar 13, 2019 at 11:11:02PM +0100, Jakub Husák wrote: > > > Sorry, fighting with this technology called "email" :) > > > > > > > > > Hopefully better wrapped outputs: > > > > > > On 13. 03. 19 22:58, Jakub Husák wrote: > > > > > > > > > > Hi, > > > > > > > > I added another disk to my 3-disk raid5 and ran a balance command. After > > > > few hours I looked to output of `fi usage` to see that no data are being > > > > used on the new disk. I got the same result even when balancing my raid5 > > > > data or metadata. > > > > > > > > Next I tried to convert my raid5 metadata to raid1 (a good idea anyway) > > > > and the new disk started to fill immediately (even though it received > > > > the whole amount of metadata with replicas being spread among the other > > > > drives, instead of being really "balanced". I know why this happened, I > > > > don't like it but I can live with it, let's not go off topic here :)). > > > > > > > > Now my usage output looks like this: > > > > > > > # btrfs filesystem usage   /mnt/data1 > > > WARNING: RAID56 detected, not implemented > > > Overall: > > >     Device size:          10.91TiB > > >     Device allocated:         316.12GiB > > >     Device unallocated:          10.61TiB > > >     Device missing:             0.00B > > >     Used:              58.86GiB > > >     Free (estimated):             0.00B    (min: 8.00EiB) > > >     Data ratio:                  0.00 > > >     Metadata ratio:              2.00 > > >     Global reserve:         512.00MiB    (used: 0.00B) > > > > > > Data,RAID5: Size:4.59TiB, Used:4.06TiB > > >    /dev/mapper/crypt-sdb       2.29TiB > > >    /dev/mapper/crypt-sdc       2.29TiB > > >    /dev/mapper/crypt-sde       2.29TiB > > > > > > Metadata,RAID1: Size:158.00GiB, Used:29.43GiB > > >    /dev/mapper/crypt-sdb      53.00GiB > > >    /dev/mapper/crypt-sdc      53.00GiB > > >    /dev/mapper/crypt-sdd     158.00GiB > > >    /dev/mapper/crypt-sde      52.00GiB > > > > > > System,RAID1: Size:64.00MiB, Used:528.00KiB > > >    /dev/mapper/crypt-sdc      32.00MiB > > >    /dev/mapper/crypt-sdd      64.00MiB > > >    /dev/mapper/crypt-sde      32.00MiB > > > > > > Unallocated: > > >    /dev/mapper/crypt-sdb     393.04GiB > > >    /dev/mapper/crypt-sdc     393.01GiB > > >    /dev/mapper/crypt-sdd       2.57TiB > > >    /dev/mapper/crypt-sde     394.01GiB > > > > > > > I'm now running `fi balance -dusage=10` (and rising the usage limit). I > > > > can see that the unallocated space is rising as it's freeing the little > > > > used chunks but still no data are being stored on the new disk. > > That is exactly what is happening: you are moving tiny amounts of data > > into existing big empty spaces, so no new chunk allocations (which should > > use the new drive) are happening. You have 470GB of data allocated > > but not used, so you have up to 235 block groups to fill before the new > > drive gets any data. > > > > Also note that you always have to do a full data balance when adding > > devices to raid5 in order to make use of all the space, so you might > > as well get started on that now. It'll take a while. 'btrfs balance > > start -dstripes=1..3 /mnt/data1' will work for this case. > > > > > > I it some bug? Is `fi usage` not showing me something (as it states > > > > "WARNING: RAID56 detected, not implemented")? > > The warning just means the fields in the 'fi usage' output header, > > like "Free (estimate)", have bogus values because they're not computed > > correctly. > > > > > > Or is there just too much > > > > free space on the first set of disks that the balancing is not bothering > > > > moving any data? > > Yes. ;) > > > > > > If so, shouldn't it be really balancing (spreading) the data among all > > > > the drives to use all the IOPS capacity, even when the raid5 redundancy > > > > constraint is currently satisfied? > > btrfs divides the disks into chunks first, then spreads the data across > > the chunks. The chunk allocation behavior spreads chunks across all the > > disks. When you are adding a disk to raid5, you have to redistribute all > > the old data across all the disks to get balanced IOPS and space usage, > > hence the full balance requirement. > > > > If you don't do a full balance, it will eventually allocate data on > > all disks, but it will run out of space on sdb, sdc, and sde first, > > and then be unable to use the remaining 2TB+ on sdd. > > > > > #  uname -a > > > Linux storage 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 > > > (2019-02-07) x86_64 GNU/Linux > > > #   btrfs --version > > > btrfs-progs v4.17 > > > #  btrfs fi show > > > Label: none  uuid: xxxxxxxxxxxxxxxxx > > >     Total devices 4 FS bytes used 4.09TiB > > >     devid    2 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdc > > >     devid    3 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdb > > >     devid    4 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sde > > >     devid    5 size 2.73TiB used 158.06GiB path /dev/mapper/crypt-sdd > > > > > > #   btrfs fi df . > > > Data, RAID5: total=4.59TiB, used=4.06TiB > > > System, RAID1: total=64.00MiB, used=528.00KiB > > > Metadata, RAID1: total=158.00GiB, used=29.43GiB > > > GlobalReserve, single: total=512.00MiB, used=0.00B > > > > > > > Thanks > > > > > > > > Jakub > > > >