* Re: Balancing raid5 after adding another disk does not move/use any data on it @ 2019-03-13 22:11 Jakub Husák 2019-03-14 14:59 ` Noah Massey 2019-03-15 18:01 ` Zygo Blaxell 0 siblings, 2 replies; 16+ messages in thread From: Jakub Husák @ 2019-03-13 22:11 UTC (permalink / raw) To: linux-btrfs Sorry, fighting with this technology called "email" :) Hopefully better wrapped outputs: On 13. 03. 19 22:58, Jakub Husák wrote: > Hi, > > I added another disk to my 3-disk raid5 and ran a balance command. > After few hours I looked to output of `fi usage` to see that no data > are being used on the new disk. I got the same result even when > balancing my raid5 data or metadata. > > Next I tried to convert my raid5 metadata to raid1 (a good idea > anyway) and the new disk started to fill immediately (even though it > received the whole amount of metadata with replicas being spread among > the other drives, instead of being really "balanced". I know why this > happened, I don't like it but I can live with it, let's not go off > topic here :)). > > Now my usage output looks like this: > # btrfs filesystem usage /mnt/data1 WARNING: RAID56 detected, not implemented Overall: Device size: 10.91TiB Device allocated: 316.12GiB Device unallocated: 10.61TiB Device missing: 0.00B Used: 58.86GiB Free (estimated): 0.00B (min: 8.00EiB) Data ratio: 0.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID5: Size:4.59TiB, Used:4.06TiB /dev/mapper/crypt-sdb 2.29TiB /dev/mapper/crypt-sdc 2.29TiB /dev/mapper/crypt-sde 2.29TiB Metadata,RAID1: Size:158.00GiB, Used:29.43GiB /dev/mapper/crypt-sdb 53.00GiB /dev/mapper/crypt-sdc 53.00GiB /dev/mapper/crypt-sdd 158.00GiB /dev/mapper/crypt-sde 52.00GiB System,RAID1: Size:64.00MiB, Used:528.00KiB /dev/mapper/crypt-sdc 32.00MiB /dev/mapper/crypt-sdd 64.00MiB /dev/mapper/crypt-sde 32.00MiB Unallocated: /dev/mapper/crypt-sdb 393.04GiB /dev/mapper/crypt-sdc 393.01GiB /dev/mapper/crypt-sdd 2.57TiB /dev/mapper/crypt-sde 394.01GiB > > I'm now running `fi balance -dusage=10` (and rising the usage limit). > I can see that the unallocated space is rising as it's freeing the > little used chunks but still no data are being stored on the new disk. > > I it some bug? Is `fi usage` not showing me something (as it states > "WARNING: RAID56 detected, not implemented")? Or is there just too > much free space on the first set of disks that the balancing is not > bothering moving any data? > > If so, shouldn't it be really balancing (spreading) the data among all > the drives to use all the IOPS capacity, even when the raid5 > redundancy constraint is currently satisfied? > > # uname -a Linux storage 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 (2019-02-07) x86_64 GNU/Linux # btrfs --version btrfs-progs v4.17 # btrfs fi show Label: none uuid: xxxxxxxxxxxxxxxxx Total devices 4 FS bytes used 4.09TiB devid 2 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdc devid 3 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdb devid 4 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sde devid 5 size 2.73TiB used 158.06GiB path /dev/mapper/crypt-sdd # btrfs fi df . Data, RAID5: total=4.59TiB, used=4.06TiB System, RAID1: total=64.00MiB, used=528.00KiB Metadata, RAID1: total=158.00GiB, used=29.43GiB GlobalReserve, single: total=512.00MiB, used=0.00B > Thanks > > Jakub > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-13 22:11 Balancing raid5 after adding another disk does not move/use any data on it Jakub Husák @ 2019-03-14 14:59 ` Noah Massey 2019-03-14 15:08 ` Noah Massey 2019-03-15 18:01 ` Zygo Blaxell 1 sibling, 1 reply; 16+ messages in thread From: Noah Massey @ 2019-03-14 14:59 UTC (permalink / raw) To: Jakub Husák; +Cc: linux-btrfs On Wed, Mar 13, 2019 at 6:13 PM Jakub Husák <jakub@husak.pro> wrote: > > Sorry, fighting with this technology called "email" :) > > > Hopefully better wrapped outputs: > > On 13. 03. 19 22:58, Jakub Husák wrote: > > > > Hi, > > > > I added another disk to my 3-disk raid5 and ran a balance command. > > After few hours I looked to output of `fi usage` to see that no data > > are being used on the new disk. I got the same result even when > > balancing my raid5 data or metadata. > > Am I correct in rephrasing your issue into "balancing 3 copy raid5 does not rebalance into 4 copy raid5"? Because that was a surprising result to me, but may be a spec that I wasn't aware of. > > > # btrfs filesystem usage /mnt/data1 > > Data,RAID5: Size:4.59TiB, Used:4.06TiB > /dev/mapper/crypt-sdb 2.29TiB > /dev/mapper/crypt-sdc 2.29TiB > /dev/mapper/crypt-sde 2.29TiB > > Metadata,RAID1: Size:158.00GiB, Used:29.43GiB > /dev/mapper/crypt-sdb 53.00GiB > /dev/mapper/crypt-sdc 53.00GiB > /dev/mapper/crypt-sdd 158.00GiB > /dev/mapper/crypt-sde 52.00GiB ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-14 14:59 ` Noah Massey @ 2019-03-14 15:08 ` Noah Massey 0 siblings, 0 replies; 16+ messages in thread From: Noah Massey @ 2019-03-14 15:08 UTC (permalink / raw) To: Jakub Husák; +Cc: linux-btrfs On Thu, Mar 14, 2019 at 10:59 AM Noah Massey <noah.massey@gmail.com> wrote: > > On Wed, Mar 13, 2019 at 6:13 PM Jakub Husák <jakub@husak.pro> wrote: > > > > Sorry, fighting with this technology called "email" :) > > > > > > Hopefully better wrapped outputs: > > > > On 13. 03. 19 22:58, Jakub Husák wrote: > > > > > > > Hi, > > > > > > I added another disk to my 3-disk raid5 and ran a balance command. > > > After few hours I looked to output of `fi usage` to see that no data > > > are being used on the new disk. I got the same result even when > > > balancing my raid5 data or metadata. > > > > > Am I correct in rephrasing your issue into "balancing 3 copy raid5 > does not rebalance into 4 copy raid5"? Because that was a surprising > result to me, but may be a spec that I wasn't aware of. > Maybe it's because the new disk does not have any RAID5 block groups? To test, I'd usually suggest adding bogus data until something got pushed to sdd, then try a balance and remove the temp data. In your case, it might be hard since all 3 disks have a similar amount of unused data. What happens if you 'dev remove' one of the old drives, and then add it back in? In case it's not clear, I'm throwing spaghetti here. Have backups. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-13 22:11 Balancing raid5 after adding another disk does not move/use any data on it Jakub Husák 2019-03-14 14:59 ` Noah Massey @ 2019-03-15 18:01 ` Zygo Blaxell 2019-03-15 18:42 ` Jakub Husák 2019-03-15 20:31 ` Hans van Kranenburg 1 sibling, 2 replies; 16+ messages in thread From: Zygo Blaxell @ 2019-03-15 18:01 UTC (permalink / raw) To: Jakub Husák; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4983 bytes --] On Wed, Mar 13, 2019 at 11:11:02PM +0100, Jakub Husák wrote: > Sorry, fighting with this technology called "email" :) > > > Hopefully better wrapped outputs: > > On 13. 03. 19 22:58, Jakub Husák wrote: > > > > Hi, > > > > I added another disk to my 3-disk raid5 and ran a balance command. After > > few hours I looked to output of `fi usage` to see that no data are being > > used on the new disk. I got the same result even when balancing my raid5 > > data or metadata. > > > > Next I tried to convert my raid5 metadata to raid1 (a good idea anyway) > > and the new disk started to fill immediately (even though it received > > the whole amount of metadata with replicas being spread among the other > > drives, instead of being really "balanced". I know why this happened, I > > don't like it but I can live with it, let's not go off topic here :)). > > > > Now my usage output looks like this: > > > # btrfs filesystem usage /mnt/data1 > WARNING: RAID56 detected, not implemented > Overall: > Device size: 10.91TiB > Device allocated: 316.12GiB > Device unallocated: 10.61TiB > Device missing: 0.00B > Used: 58.86GiB > Free (estimated): 0.00B (min: 8.00EiB) > Data ratio: 0.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,RAID5: Size:4.59TiB, Used:4.06TiB > /dev/mapper/crypt-sdb 2.29TiB > /dev/mapper/crypt-sdc 2.29TiB > /dev/mapper/crypt-sde 2.29TiB > > Metadata,RAID1: Size:158.00GiB, Used:29.43GiB > /dev/mapper/crypt-sdb 53.00GiB > /dev/mapper/crypt-sdc 53.00GiB > /dev/mapper/crypt-sdd 158.00GiB > /dev/mapper/crypt-sde 52.00GiB > > System,RAID1: Size:64.00MiB, Used:528.00KiB > /dev/mapper/crypt-sdc 32.00MiB > /dev/mapper/crypt-sdd 64.00MiB > /dev/mapper/crypt-sde 32.00MiB > > Unallocated: > /dev/mapper/crypt-sdb 393.04GiB > /dev/mapper/crypt-sdc 393.01GiB > /dev/mapper/crypt-sdd 2.57TiB > /dev/mapper/crypt-sde 394.01GiB > > > > > I'm now running `fi balance -dusage=10` (and rising the usage limit). I > > can see that the unallocated space is rising as it's freeing the little > > used chunks but still no data are being stored on the new disk. That is exactly what is happening: you are moving tiny amounts of data into existing big empty spaces, so no new chunk allocations (which should use the new drive) are happening. You have 470GB of data allocated but not used, so you have up to 235 block groups to fill before the new drive gets any data. Also note that you always have to do a full data balance when adding devices to raid5 in order to make use of all the space, so you might as well get started on that now. It'll take a while. 'btrfs balance start -dstripes=1..3 /mnt/data1' will work for this case. > > I it some bug? Is `fi usage` not showing me something (as it states > > "WARNING: RAID56 detected, not implemented")? The warning just means the fields in the 'fi usage' output header, like "Free (estimate)", have bogus values because they're not computed correctly. > > Or is there just too much > > free space on the first set of disks that the balancing is not bothering > > moving any data? Yes. ;) > > If so, shouldn't it be really balancing (spreading) the data among all > > the drives to use all the IOPS capacity, even when the raid5 redundancy > > constraint is currently satisfied? btrfs divides the disks into chunks first, then spreads the data across the chunks. The chunk allocation behavior spreads chunks across all the disks. When you are adding a disk to raid5, you have to redistribute all the old data across all the disks to get balanced IOPS and space usage, hence the full balance requirement. If you don't do a full balance, it will eventually allocate data on all disks, but it will run out of space on sdb, sdc, and sde first, and then be unable to use the remaining 2TB+ on sdd. > > > # uname -a > Linux storage 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 > (2019-02-07) x86_64 GNU/Linux > # btrfs --version > btrfs-progs v4.17 > # btrfs fi show > Label: none uuid: xxxxxxxxxxxxxxxxx > Total devices 4 FS bytes used 4.09TiB > devid 2 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdc > devid 3 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdb > devid 4 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sde > devid 5 size 2.73TiB used 158.06GiB path /dev/mapper/crypt-sdd > > # btrfs fi df . > Data, RAID5: total=4.59TiB, used=4.06TiB > System, RAID1: total=64.00MiB, used=528.00KiB > Metadata, RAID1: total=158.00GiB, used=29.43GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > > Thanks > > > > Jakub > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-15 18:01 ` Zygo Blaxell @ 2019-03-15 18:42 ` Jakub Husák 2019-03-15 18:59 ` Zygo Blaxell 2019-03-15 20:31 ` Hans van Kranenburg 1 sibling, 1 reply; 16+ messages in thread From: Jakub Husák @ 2019-03-15 18:42 UTC (permalink / raw) To: linux-btrfs Thanks for explanation! actually when I moved forward with the rebalancing the fourth disk started to receive some data. BTW, I was hoping some filter like '-dstripes=1..3' existed and it is! Wouldn't it deserve some documentation? :) Also thanks to Noah Massey for caring! Cheers On 15. 03. 19 19:01, Zygo Blaxell wrote: > On Wed, Mar 13, 2019 at 11:11:02PM +0100, Jakub Husák wrote: >> Sorry, fighting with this technology called "email" :) >> >> >> Hopefully better wrapped outputs: >> >> On 13. 03. 19 22:58, Jakub Husák wrote: >> >> >>> Hi, >>> >>> I added another disk to my 3-disk raid5 and ran a balance command. After >>> few hours I looked to output of `fi usage` to see that no data are being >>> used on the new disk. I got the same result even when balancing my raid5 >>> data or metadata. >>> >>> Next I tried to convert my raid5 metadata to raid1 (a good idea anyway) >>> and the new disk started to fill immediately (even though it received >>> the whole amount of metadata with replicas being spread among the other >>> drives, instead of being really "balanced". I know why this happened, I >>> don't like it but I can live with it, let's not go off topic here :)). >>> >>> Now my usage output looks like this: >>> >> # btrfs filesystem usage /mnt/data1 >> WARNING: RAID56 detected, not implemented >> Overall: >> Device size: 10.91TiB >> Device allocated: 316.12GiB >> Device unallocated: 10.61TiB >> Device missing: 0.00B >> Used: 58.86GiB >> Free (estimated): 0.00B (min: 8.00EiB) >> Data ratio: 0.00 >> Metadata ratio: 2.00 >> Global reserve: 512.00MiB (used: 0.00B) >> >> Data,RAID5: Size:4.59TiB, Used:4.06TiB >> /dev/mapper/crypt-sdb 2.29TiB >> /dev/mapper/crypt-sdc 2.29TiB >> /dev/mapper/crypt-sde 2.29TiB >> >> Metadata,RAID1: Size:158.00GiB, Used:29.43GiB >> /dev/mapper/crypt-sdb 53.00GiB >> /dev/mapper/crypt-sdc 53.00GiB >> /dev/mapper/crypt-sdd 158.00GiB >> /dev/mapper/crypt-sde 52.00GiB >> >> System,RAID1: Size:64.00MiB, Used:528.00KiB >> /dev/mapper/crypt-sdc 32.00MiB >> /dev/mapper/crypt-sdd 64.00MiB >> /dev/mapper/crypt-sde 32.00MiB >> >> Unallocated: >> /dev/mapper/crypt-sdb 393.04GiB >> /dev/mapper/crypt-sdc 393.01GiB >> /dev/mapper/crypt-sdd 2.57TiB >> /dev/mapper/crypt-sde 394.01GiB >> >>> I'm now running `fi balance -dusage=10` (and rising the usage limit). I >>> can see that the unallocated space is rising as it's freeing the little >>> used chunks but still no data are being stored on the new disk. > That is exactly what is happening: you are moving tiny amounts of data > into existing big empty spaces, so no new chunk allocations (which should > use the new drive) are happening. You have 470GB of data allocated > but not used, so you have up to 235 block groups to fill before the new > drive gets any data. > > Also note that you always have to do a full data balance when adding > devices to raid5 in order to make use of all the space, so you might > as well get started on that now. It'll take a while. 'btrfs balance > start -dstripes=1..3 /mnt/data1' will work for this case. > >>> I it some bug? Is `fi usage` not showing me something (as it states >>> "WARNING: RAID56 detected, not implemented")? > The warning just means the fields in the 'fi usage' output header, > like "Free (estimate)", have bogus values because they're not computed > correctly. > >>> Or is there just too much >>> free space on the first set of disks that the balancing is not bothering >>> moving any data? > Yes. ;) > >>> If so, shouldn't it be really balancing (spreading) the data among all >>> the drives to use all the IOPS capacity, even when the raid5 redundancy >>> constraint is currently satisfied? > btrfs divides the disks into chunks first, then spreads the data across > the chunks. The chunk allocation behavior spreads chunks across all the > disks. When you are adding a disk to raid5, you have to redistribute all > the old data across all the disks to get balanced IOPS and space usage, > hence the full balance requirement. > > If you don't do a full balance, it will eventually allocate data on > all disks, but it will run out of space on sdb, sdc, and sde first, > and then be unable to use the remaining 2TB+ on sdd. > >> # uname -a >> Linux storage 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 >> (2019-02-07) x86_64 GNU/Linux >> # btrfs --version >> btrfs-progs v4.17 >> # btrfs fi show >> Label: none uuid: xxxxxxxxxxxxxxxxx >> Total devices 4 FS bytes used 4.09TiB >> devid 2 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdc >> devid 3 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdb >> devid 4 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sde >> devid 5 size 2.73TiB used 158.06GiB path /dev/mapper/crypt-sdd >> >> # btrfs fi df . >> Data, RAID5: total=4.59TiB, used=4.06TiB >> System, RAID1: total=64.00MiB, used=528.00KiB >> Metadata, RAID1: total=158.00GiB, used=29.43GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> >>> Thanks >>> >>> Jakub >>> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-15 18:42 ` Jakub Husák @ 2019-03-15 18:59 ` Zygo Blaxell 0 siblings, 0 replies; 16+ messages in thread From: Zygo Blaxell @ 2019-03-15 18:59 UTC (permalink / raw) To: Jakub Husák; +Cc: linux-btrfs On Fri, Mar 15, 2019 at 07:42:21PM +0100, Jakub Husák wrote: > Thanks for explanation! actually when I moved forward with the rebalancing > the fourth disk started to receive some data. > > BTW, I was hoping some filter like '-dstripes=1..3' existed and it is! > Wouldn't it deserve some documentation? :) It has some, from the man page for btrfs-balance: stripes=<range> Balance only block groups which have the given number of stripes. The parameter is a range specified as start..end. Makes sense for block group profiles that utilize striping, ie. RAID0/10/5/6. The range minimum and maximum are inclusive. There are probably some wikis that could benefit from a sentence or two explaining when you'd use this option. Or a table of which RAID profiles must be balanced after a device add (always raid0, raid5, raid6, sometimes raid1 and raid10) and which don't (never single, dup, sometimes raid1 and raid10). > Also thanks to Noah Massey for caring! > > Cheers > > > On 15. 03. 19 19:01, Zygo Blaxell wrote: > > On Wed, Mar 13, 2019 at 11:11:02PM +0100, Jakub Husák wrote: > > > Sorry, fighting with this technology called "email" :) > > > > > > > > > Hopefully better wrapped outputs: > > > > > > On 13. 03. 19 22:58, Jakub Husák wrote: > > > > > > > > > > Hi, > > > > > > > > I added another disk to my 3-disk raid5 and ran a balance command. After > > > > few hours I looked to output of `fi usage` to see that no data are being > > > > used on the new disk. I got the same result even when balancing my raid5 > > > > data or metadata. > > > > > > > > Next I tried to convert my raid5 metadata to raid1 (a good idea anyway) > > > > and the new disk started to fill immediately (even though it received > > > > the whole amount of metadata with replicas being spread among the other > > > > drives, instead of being really "balanced". I know why this happened, I > > > > don't like it but I can live with it, let's not go off topic here :)). > > > > > > > > Now my usage output looks like this: > > > > > > > # btrfs filesystem usage /mnt/data1 > > > WARNING: RAID56 detected, not implemented > > > Overall: > > > Device size: 10.91TiB > > > Device allocated: 316.12GiB > > > Device unallocated: 10.61TiB > > > Device missing: 0.00B > > > Used: 58.86GiB > > > Free (estimated): 0.00B (min: 8.00EiB) > > > Data ratio: 0.00 > > > Metadata ratio: 2.00 > > > Global reserve: 512.00MiB (used: 0.00B) > > > > > > Data,RAID5: Size:4.59TiB, Used:4.06TiB > > > /dev/mapper/crypt-sdb 2.29TiB > > > /dev/mapper/crypt-sdc 2.29TiB > > > /dev/mapper/crypt-sde 2.29TiB > > > > > > Metadata,RAID1: Size:158.00GiB, Used:29.43GiB > > > /dev/mapper/crypt-sdb 53.00GiB > > > /dev/mapper/crypt-sdc 53.00GiB > > > /dev/mapper/crypt-sdd 158.00GiB > > > /dev/mapper/crypt-sde 52.00GiB > > > > > > System,RAID1: Size:64.00MiB, Used:528.00KiB > > > /dev/mapper/crypt-sdc 32.00MiB > > > /dev/mapper/crypt-sdd 64.00MiB > > > /dev/mapper/crypt-sde 32.00MiB > > > > > > Unallocated: > > > /dev/mapper/crypt-sdb 393.04GiB > > > /dev/mapper/crypt-sdc 393.01GiB > > > /dev/mapper/crypt-sdd 2.57TiB > > > /dev/mapper/crypt-sde 394.01GiB > > > > > > > I'm now running `fi balance -dusage=10` (and rising the usage limit). I > > > > can see that the unallocated space is rising as it's freeing the little > > > > used chunks but still no data are being stored on the new disk. > > That is exactly what is happening: you are moving tiny amounts of data > > into existing big empty spaces, so no new chunk allocations (which should > > use the new drive) are happening. You have 470GB of data allocated > > but not used, so you have up to 235 block groups to fill before the new > > drive gets any data. > > > > Also note that you always have to do a full data balance when adding > > devices to raid5 in order to make use of all the space, so you might > > as well get started on that now. It'll take a while. 'btrfs balance > > start -dstripes=1..3 /mnt/data1' will work for this case. > > > > > > I it some bug? Is `fi usage` not showing me something (as it states > > > > "WARNING: RAID56 detected, not implemented")? > > The warning just means the fields in the 'fi usage' output header, > > like "Free (estimate)", have bogus values because they're not computed > > correctly. > > > > > > Or is there just too much > > > > free space on the first set of disks that the balancing is not bothering > > > > moving any data? > > Yes. ;) > > > > > > If so, shouldn't it be really balancing (spreading) the data among all > > > > the drives to use all the IOPS capacity, even when the raid5 redundancy > > > > constraint is currently satisfied? > > btrfs divides the disks into chunks first, then spreads the data across > > the chunks. The chunk allocation behavior spreads chunks across all the > > disks. When you are adding a disk to raid5, you have to redistribute all > > the old data across all the disks to get balanced IOPS and space usage, > > hence the full balance requirement. > > > > If you don't do a full balance, it will eventually allocate data on > > all disks, but it will run out of space on sdb, sdc, and sde first, > > and then be unable to use the remaining 2TB+ on sdd. > > > > > # uname -a > > > Linux storage 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 > > > (2019-02-07) x86_64 GNU/Linux > > > # btrfs --version > > > btrfs-progs v4.17 > > > # btrfs fi show > > > Label: none uuid: xxxxxxxxxxxxxxxxx > > > Total devices 4 FS bytes used 4.09TiB > > > devid 2 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdc > > > devid 3 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdb > > > devid 4 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sde > > > devid 5 size 2.73TiB used 158.06GiB path /dev/mapper/crypt-sdd > > > > > > # btrfs fi df . > > > Data, RAID5: total=4.59TiB, used=4.06TiB > > > System, RAID1: total=64.00MiB, used=528.00KiB > > > Metadata, RAID1: total=158.00GiB, used=29.43GiB > > > GlobalReserve, single: total=512.00MiB, used=0.00B > > > > > > > Thanks > > > > > > > > Jakub > > > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-15 18:01 ` Zygo Blaxell 2019-03-15 18:42 ` Jakub Husák @ 2019-03-15 20:31 ` Hans van Kranenburg 2019-03-16 6:07 ` Andrei Borzenkov 1 sibling, 1 reply; 16+ messages in thread From: Hans van Kranenburg @ 2019-03-15 20:31 UTC (permalink / raw) To: Zygo Blaxell, Jakub Husák; +Cc: linux-btrfs On 3/15/19 7:01 PM, Zygo Blaxell wrote: > On Wed, Mar 13, 2019 at 11:11:02PM +0100, Jakub Husák wrote: >> Sorry, fighting with this technology called "email" :) >> >> >> Hopefully better wrapped outputs: >> >> On 13. 03. 19 22:58, Jakub Husák wrote: >> >> >>> Hi, >>> >>> I added another disk to my 3-disk raid5 and ran a balance command. After >>> few hours I looked to output of `fi usage` to see that no data are being >>> used on the new disk. I got the same result even when balancing my raid5 >>> data or metadata. >>> >>> Next I tried to convert my raid5 metadata to raid1 (a good idea anyway) >>> and the new disk started to fill immediately (even though it received >>> the whole amount of metadata with replicas being spread among the other >>> drives, instead of being really "balanced". I know why this happened, I >>> don't like it but I can live with it, let's not go off topic here :)). >>> >>> Now my usage output looks like this: >>> >> # btrfs filesystem usage /mnt/data1 >> WARNING: RAID56 detected, not implemented >> Overall: >> Device size: 10.91TiB >> Device allocated: 316.12GiB >> Device unallocated: 10.61TiB >> Device missing: 0.00B >> Used: 58.86GiB >> Free (estimated): 0.00B (min: 8.00EiB) >> Data ratio: 0.00 >> Metadata ratio: 2.00 >> Global reserve: 512.00MiB (used: 0.00B) >> >> Data,RAID5: Size:4.59TiB, Used:4.06TiB >> /dev/mapper/crypt-sdb 2.29TiB >> /dev/mapper/crypt-sdc 2.29TiB >> /dev/mapper/crypt-sde 2.29TiB >> >> Metadata,RAID1: Size:158.00GiB, Used:29.43GiB >> /dev/mapper/crypt-sdb 53.00GiB >> /dev/mapper/crypt-sdc 53.00GiB >> /dev/mapper/crypt-sdd 158.00GiB >> /dev/mapper/crypt-sde 52.00GiB >> >> System,RAID1: Size:64.00MiB, Used:528.00KiB >> /dev/mapper/crypt-sdc 32.00MiB >> /dev/mapper/crypt-sdd 64.00MiB >> /dev/mapper/crypt-sde 32.00MiB >> >> Unallocated: >> /dev/mapper/crypt-sdb 393.04GiB >> /dev/mapper/crypt-sdc 393.01GiB >> /dev/mapper/crypt-sdd 2.57TiB >> /dev/mapper/crypt-sde 394.01GiB >> >>> >>> I'm now running `fi balance -dusage=10` (and rising the usage limit). I >>> can see that the unallocated space is rising as it's freeing the little >>> used chunks but still no data are being stored on the new disk. > > That is exactly what is happening: you are moving tiny amounts of data > into existing big empty spaces, so no new chunk allocations (which should > use the new drive) are happening. You have 470GB of data allocated > but not used, so you have up to 235 block groups to fill before the new > drive gets any data. > > Also note that you always have to do a full data balance when adding > devices to raid5 in order to make use of all the space, so you might > as well get started on that now. It'll take a while. 'btrfs balance > start -dstripes=1..3 /mnt/data1' will work for this case. > >>> I it some bug? Is `fi usage` not showing me something (as it states >>> "WARNING: RAID56 detected, not implemented")? > > The warning just means the fields in the 'fi usage' output header, > like "Free (estimate)", have bogus values because they're not computed > correctly. The output of the btrfs-usage-report which comes with the python-btrfs library (since v11) might be interesting for you here. It actually will show you pretty accurate numbers, and it also contains a section that exactly shows you how much currently unallocatable raw disk space you have on which disk. While moving around things with balance, you can see the numbers change. >>> Or is there just too much >>> free space on the first set of disks that the balancing is not bothering >>> moving any data? > > Yes. ;) > >>> If so, shouldn't it be really balancing (spreading) the data among all >>> the drives to use all the IOPS capacity, even when the raid5 redundancy >>> constraint is currently satisfied? > > btrfs divides the disks into chunks first, then spreads the data across > the chunks. The chunk allocation behavior spreads chunks across all the > disks. When you are adding a disk to raid5, you have to redistribute all > the old data across all the disks to get balanced IOPS and space usage, > hence the full balance requirement. > > If you don't do a full balance, it will eventually allocate data on > all disks, but it will run out of space on sdb, sdc, and sde first, > and then be unable to use the remaining 2TB+ on sdd. Also, if you have a lot of empty space in the current allocations, btrfs balance has the tendency to first start packing everything together before allocating new (4 disk wide) block groups. This is annoying, because it can result in moving the same data multiple times during balance (into empty space of another existing block group, and then when that one has its turn again etc). So you want to get rid of empty space in existing block groups as soon as possible. btrfs-balance-least-used can do this, (also an example from python-btrfs), by doing them in order of most empty one first. A copy of the script with the following change will filter out block groups that already span 4 drives: diff --git a/bin/btrfs-balance-least-used b/bin/btrfs-balance-least-used index 7005347..0b243a3 100755 --- a/bin/btrfs-balance-least-used +++ b/bin/btrfs-balance-least-used @@ -41,6 +41,8 @@ def load_block_groups(fs, max_used_pct): for chunk in fs.chunks(): if not (chunk.type & btrfs.BLOCK_GROUP_DATA): continue + if len(chunk.stripes) > 3: + continue try: block_group = fs.block_group(chunk.vaddr, chunk.length) if block_group.used_pct <= max_used_pct: https://github.com/knorrie/python-btrfs/tree/master/bin >> # uname -a >> Linux storage 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 >> (2019-02-07) x86_64 GNU/Linux >> # btrfs --version >> btrfs-progs v4.17 >> # btrfs fi show >> Label: none uuid: xxxxxxxxxxxxxxxxx >> Total devices 4 FS bytes used 4.09TiB >> devid 2 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdc >> devid 3 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdb >> devid 4 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sde >> devid 5 size 2.73TiB used 158.06GiB path /dev/mapper/crypt-sdd >> >> # btrfs fi df . >> Data, RAID5: total=4.59TiB, used=4.06TiB >> System, RAID1: total=64.00MiB, used=528.00KiB >> Metadata, RAID1: total=158.00GiB, used=29.43GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B </commercials break> Hans ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-15 20:31 ` Hans van Kranenburg @ 2019-03-16 6:07 ` Andrei Borzenkov 2019-03-16 16:34 ` Hans van Kranenburg 2019-03-16 23:10 ` Zygo Blaxell 0 siblings, 2 replies; 16+ messages in thread From: Andrei Borzenkov @ 2019-03-16 6:07 UTC (permalink / raw) To: Hans van Kranenburg, Zygo Blaxell, Jakub Husák; +Cc: linux-btrfs 15.03.2019 23:31, Hans van Kranenburg пишет: ... >> >>>> If so, shouldn't it be really balancing (spreading) the data among all >>>> the drives to use all the IOPS capacity, even when the raid5 redundancy >>>> constraint is currently satisfied? >> >> btrfs divides the disks into chunks first, then spreads the data across >> the chunks. The chunk allocation behavior spreads chunks across all the >> disks. When you are adding a disk to raid5, you have to redistribute all >> the old data across all the disks to get balanced IOPS and space usage, >> hence the full balance requirement. >> >> If you don't do a full balance, it will eventually allocate data on >> all disks, but it will run out of space on sdb, sdc, and sde first, >> and then be unable to use the remaining 2TB+ on sdd. > > Also, if you have a lot of empty space in the current allocations, btrfs > balance has the tendency to first start packing everything together > before allocating new (4 disk wide) block groups. > > This is annoying, because it can result in moving the same data multiple > times during balance (into empty space of another existing block group, > and then when that one has its turn again etc). > > So you want to get rid of empty space in existing block groups as soon > as possible. btrfs-balance-least-used can do this, (also an example from > python-btrfs), by doing them in order of most empty one first. > But if I understand the above correctly it will still attempt to move data in next most empty chunks first. Is there any way to force allocation of new chunks? Or, better, force usage of chunks with given stripe width as balance target? This thread actually made me wonder - is there any guarantee (or even tentative promise) about RAID stripe width from btrfs at all? Is it possible that RAID5 degrades to mirror by itself due to unfortunate space distribution? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-16 6:07 ` Andrei Borzenkov @ 2019-03-16 16:34 ` Hans van Kranenburg 2019-03-16 19:51 ` Hans van Kranenburg 2019-03-16 23:10 ` Zygo Blaxell 1 sibling, 1 reply; 16+ messages in thread From: Hans van Kranenburg @ 2019-03-16 16:34 UTC (permalink / raw) To: Andrei Borzenkov, Zygo Blaxell, Jakub Husák; +Cc: linux-btrfs On 3/16/19 7:07 AM, Andrei Borzenkov wrote: > 15.03.2019 23:31, Hans van Kranenburg пишет: > ... >>> >>>>> If so, shouldn't it be really balancing (spreading) the data among all >>>>> the drives to use all the IOPS capacity, even when the raid5 redundancy >>>>> constraint is currently satisfied? >>> >>> btrfs divides the disks into chunks first, then spreads the data across >>> the chunks. The chunk allocation behavior spreads chunks across all the >>> disks. When you are adding a disk to raid5, you have to redistribute all >>> the old data across all the disks to get balanced IOPS and space usage, >>> hence the full balance requirement. >>> >>> If you don't do a full balance, it will eventually allocate data on >>> all disks, but it will run out of space on sdb, sdc, and sde first, >>> and then be unable to use the remaining 2TB+ on sdd. >> >> Also, if you have a lot of empty space in the current allocations, btrfs >> balance has the tendency to first start packing everything together >> before allocating new (4 disk wide) block groups. >> >> This is annoying, because it can result in moving the same data multiple >> times during balance (into empty space of another existing block group, >> and then when that one has its turn again etc). >>> So you want to get rid of empty space in existing block groups as soon >> as possible. btrfs-balance-least-used can do this, (also an example from >> python-btrfs), by doing them in order of most empty one first. >> > > But if I understand the above correctly it will still attempt to move > data in next most empty chunks first. Balance feeds data back to the fs as new writes, so it will try filling up existing block groups with lowest vaddr first (when running nossd/ssd mode). Newly added block groups (/chunks) always get a new vaddr which is higher than everything else, so they're chosen last, which means when all lower numbered ones are packed with data and we keep removing those ones. > Is there any way to force > allocation of new chunks? Or, better, force usage of chunks with given > stripe width as balance target? Nope. Or, the other way, blacklisting everything that you know you want to get rid of. Currently that's not possible. It would be knobs that influence the extent allocator (e.g. prefer writing into chunk with highest num_stripes first). Conversion has a similar problem. For every chunk that gets converted, you get a new empty one with the new target profile, and it's quite possible that you're first rewriting data a few times, (depending on how compacted everything already was) into the existing old profile chunks before actually starting to use the new profile. Having a lot of empty space in existing block groups is something that mainly happens after removing a lot of data. In that case, if you care, compacting everything together with least amount of data movement is why I added the balance-least-first algorithm. Since we're not using the "cluster" allocator for data any more (the ssd-option related change in 4.14), normal operation with equal amounts of removing and adding data all the time do not result in overallocation any more. > This thread actually made me wonder - is there any guarantee (or even > tentative promise) about RAID stripe width from btrfs at all? Is it > possible that RAID5 degrades to mirror by itself due to unfortunate > space distribution? For RAID5, minimum is two disks. So yes, if you add two disks and don't forcibly rewrite all your data, it will happily start adding two-disk RAID5 block groups if the other disks are full. Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-16 16:34 ` Hans van Kranenburg @ 2019-03-16 19:51 ` Hans van Kranenburg 2019-03-17 20:52 ` Jakub Husák 0 siblings, 1 reply; 16+ messages in thread From: Hans van Kranenburg @ 2019-03-16 19:51 UTC (permalink / raw) To: Andrei Borzenkov, Zygo Blaxell, Jakub Husák; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1351 bytes --] On 3/16/19 5:34 PM, Hans van Kranenburg wrote: > On 3/16/19 7:07 AM, Andrei Borzenkov wrote: >> [...] >> This thread actually made me wonder - is there any guarantee (or even >> tentative promise) about RAID stripe width from btrfs at all? Is it >> possible that RAID5 degrades to mirror by itself due to unfortunate >> space distribution? > > For RAID5, minimum is two disks. So yes, if you add two disks and don't > forcibly rewrite all your data, it will happily start adding two-disk > RAID5 block groups if the other disks are full. Attached an example that shows a list of used physical and virtual space ordered by chunk type (== block group flags) and also num_stripes (how many disks (or, dev extents)) are used. The btrfs-usage-report does not add this level of detail. (But maybe it would be interesting to add, but then I would add it into the btrfs.fs_usage code...) For the RAID56 with a big mess of different block groups with different "horizontal size" this will be more interesting than what it shows here as test: -# ./chunks_stripes_report.py / flags num_stripes physical virtual ----- ----------- -------- ------- DATA 1 759.00GiB 759.00GiB SYSTEM|DUP 2 64.00MiB 32.00MiB METADATA|DUP 2 7.00GiB 3.50GiB Hans [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: chunks_stripes_report.py --] [-- Type: text/x-python; name="chunks_stripes_report.py", Size: 1010 bytes --] #!/usr/bin/python3 import btrfs from collections import defaultdict, Counter physical_bytes = defaultdict(Counter) virtual_bytes = defaultdict(Counter) with btrfs.FileSystem('/') as fs: for chunk in fs.chunks(): physical_bytes[chunk.type][chunk.num_stripes] += \ btrfs.volumes.chunk_to_dev_extent_length(chunk) * chunk.num_stripes virtual_bytes[chunk.type][chunk.num_stripes] += chunk.length report_lines = [ ('flags', 'num_stripes', 'physical', 'virtual'), ('-----', '-----------', '--------', '-------'), ] for flags, counter in physical_bytes.items(): for num_stripes, pbytes in counter.items(): report_lines.append(( btrfs.utils.block_group_flags_str(flags), num_stripes, btrfs.utils.pretty_size(pbytes), btrfs.utils.pretty_size(virtual_bytes[flags][num_stripes]), )) for report_line in report_lines: print("{: <16} {: >11} {: >12} {: >12}".format(*report_line)) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-16 19:51 ` Hans van Kranenburg @ 2019-03-17 20:52 ` Jakub Husák 2019-03-17 22:53 ` Hans van Kranenburg 0 siblings, 1 reply; 16+ messages in thread From: Jakub Husák @ 2019-03-17 20:52 UTC (permalink / raw) To: linux-btrfs This is a great tool Hans! This kind of overview should be a part of btrfs-progs. Mine looks currently like this, I have a few more days to go with rebalancing :) flags num_stripes physical virtual ----- ----------- -------- ------- DATA|RAID5 3 5.29TiB 3.53TiB DATA|RAID5 4 980.00GiB 735.00GiB SYSTEM|RAID1 2 128.00MiB 64.00MiB METADATA|RAID1 2 314.00GiB 157.00GiB Btw, I checked the other utils in your python-btrfs and it seems that they are, sadly, not installed with simple pip install, which would be great. Maybe it needs a few lines in setup.py (i'm not too familiar with python packaging)? On 16. 03. 19 20:51, Hans van Kranenburg wrote: > On 3/16/19 5:34 PM, Hans van Kranenburg wrote: >> On 3/16/19 7:07 AM, Andrei Borzenkov wrote: >>> [...] >>> This thread actually made me wonder - is there any guarantee (or even >>> tentative promise) about RAID stripe width from btrfs at all? Is it >>> possible that RAID5 degrades to mirror by itself due to unfortunate >>> space distribution? >> For RAID5, minimum is two disks. So yes, if you add two disks and don't >> forcibly rewrite all your data, it will happily start adding two-disk >> RAID5 block groups if the other disks are full. > Attached an example that shows a list of used physical and virtual space > ordered by chunk type (== block group flags) and also num_stripes (how > many disks (or, dev extents)) are used. The btrfs-usage-report does not > add this level of detail. (But maybe it would be interesting to add, but > then I would add it into the btrfs.fs_usage code...) > > For the RAID56 with a big mess of different block groups with different > "horizontal size" this will be more interesting than what it shows here > as test: > > -# ./chunks_stripes_report.py / > flags num_stripes physical virtual > ----- ----------- -------- ------- > DATA 1 759.00GiB 759.00GiB > SYSTEM|DUP 2 64.00MiB 32.00MiB > METADATA|DUP 2 7.00GiB 3.50GiB > > > Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-17 20:52 ` Jakub Husák @ 2019-03-17 22:53 ` Hans van Kranenburg 2019-03-18 19:54 ` Marc Joliet 0 siblings, 1 reply; 16+ messages in thread From: Hans van Kranenburg @ 2019-03-17 22:53 UTC (permalink / raw) To: Jakub Husák, linux-btrfs Hi, On 3/17/19 9:52 PM, Jakub Husák wrote: > This is a great tool Hans! This kind of overview should be a part of > btrfs-progs. Thing is... this seems super useful because it's super useful for the exact thing you are currently doing and trying to find out. Fun thing is, there are a thousand other things in other scenarios that are interesting to know. Should btrfs-progs implement hardcoded solutions for all of them? Or cover 80% of what's needed with 20% of effort? The main reason why I started writing the python-btrfs library is that it allows me to just quickly write a few lines of code to get some information, exactly for what I want to know at that point. In the previous example, writing the table with output is already more than three times as many lines of code than getting the actual info, which is a simple 'for chunk in fs.chunks()' and then boom, you have a lot of info to do something with. https://python-btrfs.readthedocs.io/en/stable/btrfs.html#btrfs.ctree.Chunk > Mine looks currently like this, I have a few more days to go with > rebalancing :) > > flags num_stripes physical virtual > ----- ----------- -------- ------- > DATA|RAID5 3 5.29TiB 3.53TiB > DATA|RAID5 4 980.00GiB 735.00GiB > SYSTEM|RAID1 2 128.00MiB 64.00MiB > METADATA|RAID1 2 314.00GiB 157.00GiB Ha, nice! > Btw, I checked the other utils in your python-btrfs and it seems that > they are, sadly, not installed with simple pip install, which would be > great. Maybe it needs a few lines in setup.py (i'm not too familiar with > python packaging)? Can you share how you're using this? Personally, I never use pip for anything, so I might not be putting in there what users expect. My latest thought about this was that users use pip to have some library dependency for something else, so they don't need standalone programs and example scripts? I mainly have debian packages installed everywhere, and otherwise I'm doing a git clone of the project from github and mess around in there, with added benefit that I can view history on all files. Hans ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-17 22:53 ` Hans van Kranenburg @ 2019-03-18 19:54 ` Marc Joliet 0 siblings, 0 replies; 16+ messages in thread From: Marc Joliet @ 2019-03-18 19:54 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1094 bytes --] Am Sonntag, 17. März 2019, 23:53:45 CET schrieb Hans van Kranenburg: > My latest thought about this was that users use > pip to have some library dependency for something else, so they don't > need standalone programs and example scripts? My current understanding is that that Python land kinda wants everybody to use pip to install anything written in Python (except the science people, who gravitate more towards conda, though it can wrap pip for software not packaged natively). So yeah, it's perfectly natural to install scripts with pip, though I forgot where exactly in setup.py you have to specify them. Examples include SCons (which is also distributed via pip), various linters such as flake8, and test frameworks such as nose which also come with scripts needed to drive them. (Also, I seem to remember that there are provisions for specifying examples separately from regular scripts, but I forgot the specifics.) Greetings -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-16 6:07 ` Andrei Borzenkov 2019-03-16 16:34 ` Hans van Kranenburg @ 2019-03-16 23:10 ` Zygo Blaxell 1 sibling, 0 replies; 16+ messages in thread From: Zygo Blaxell @ 2019-03-16 23:10 UTC (permalink / raw) To: Andrei Borzenkov; +Cc: Hans van Kranenburg, Jakub Husák, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 8186 bytes --] On Sat, Mar 16, 2019 at 09:07:17AM +0300, Andrei Borzenkov wrote: > 15.03.2019 23:31, Hans van Kranenburg пишет: > ... > >> > >>>> If so, shouldn't it be really balancing (spreading) the data among all > >>>> the drives to use all the IOPS capacity, even when the raid5 redundancy > >>>> constraint is currently satisfied? > >> > >> btrfs divides the disks into chunks first, then spreads the data across > >> the chunks. The chunk allocation behavior spreads chunks across all the > >> disks. When you are adding a disk to raid5, you have to redistribute all > >> the old data across all the disks to get balanced IOPS and space usage, > >> hence the full balance requirement. > >> > >> If you don't do a full balance, it will eventually allocate data on > >> all disks, but it will run out of space on sdb, sdc, and sde first, > >> and then be unable to use the remaining 2TB+ on sdd. > > > > Also, if you have a lot of empty space in the current allocations, btrfs > > balance has the tendency to first start packing everything together > > before allocating new (4 disk wide) block groups. > > > > This is annoying, because it can result in moving the same data multiple > > times during balance (into empty space of another existing block group, > > and then when that one has its turn again etc). > > > So you want to get rid of empty space in existing block groups as soon > > as possible. btrfs-balance-least-used can do this, (also an example from > > python-btrfs), by doing them in order of most empty one first. > > > > But if I understand the above correctly it will still attempt to move > data in next most empty chunks first. Is there any way to force > allocation of new chunks? Or, better, force usage of chunks with given > stripe width as balance target? > > This thread actually made me wonder - is there any guarantee (or even > tentative promise) about RAID stripe width from btrfs at all? Is it > possible that RAID5 degrades to mirror by itself due to unfortunate > space distribution? Note that the data layout of RAID5 with 1 data disk, 1 parity disk, and even parity is identical to RAID1 with 1 data disk and 1 mirror copy. The two algorithms produce the same data layout with those parameters. IRC btrfs uses odd parity, so on btrfs the RAID5 parity stripes are the complement of the data stripes, but they don't need to be: with even parity on 2 disks, the data and parity blocks are identical and interchangeable. If you have RAID5 with non-equal device sizes, as long as the two largest disks are the same size, btrfs will adjust the stripe width to match the disks with free space available, subject to the constraint that the resulting block group must have enough disks to survive one disk failure. e.g. for RAID5 with 5 disks, 2x3TB, 2x2TB, 1x1TB, you get three zones: -> raid5 fills smallest unallocated spaces first, all drives -> 3TB AAAAAAAAAABBBBBBBBBBCCCCCCCCCC 3TB AAAAAAAAAABBBBBBBBBBCCCCCCCCCC 2TB AAAAAAAAAABBBBBBBBBB 2TB AAAAAAAAAABBBBBBBBBB 1TB AAAAAAAAAA Zone "A" is 5 disks wide, zone "B" is 4 disks wide, and zone "C" is 2 disks wide (each letter represents 100x1GB chunks). This is not necessarily how the data is laid out on disk--the btrfs allocator will store data on disk in some permutation of this order; however, the total number of chunks in each zone on each disk is as shown. For -draid5 -mraid1, you can get patterns like this: <- raid1 fills largest unallocated spaces first, 2 drives <- 3TB 5AAAAAAAA4BBBBBBBBB3CCCCCCCC21 3TB 5AAAAAAAA4BBBBBBBBB3CCCCCCCC21 2TB 6AAAAAAAADBBBBBBBBBC 2TB 6AAAAAAAADBBBBBBBBBC 1TB UAAAAAAAAD where numbered zones are raid1 metadata chunks, zone "D" is raid5 3 disks wide, and "U" is the worst-case one unusable 1GB chunk (not to scale) in arrays with an odd number of disks. The numbered zones occupy space that would normally form a full-width raid5 stripe in the zone, so the last raid5 block groups in each zone are less wide (i.e. the metadata chunks in the "B" zone make some stripes in the "B" zone space behave like stripes in "C" zone space). If the allocations start from empty disks and there are no array reshaping operations (convert profile, add/delete/resize devices) then the allocator should allocate all the usable space as efficiently as possible. In the -draid5 -mraid1 case, it would be slightly more efficient to allocate all the metadata in the "C" zone so it doesn't make any narrower stripes in the "B" and "A" zones. Typically this is exactly what happens, since all the "A" and "B" space must be allocated before raid5 can reach the "C" zone from the left, while all the "C" space must be allocated before raid1 can reach the "B" zone from the right, and the two allocators only interact when the filesystem is completely full. <- raid1 fills from the right, raid5 from the left <- 3TB AAAAAAAAAABBBBBBBBBBCCCC654321 3TB AAAAAAAAAABBBBBBBBBBCCCC654321 2TB AAAAAAAAAABBBBBBBBBB 2TB AAAAAAAAAABBBBBBBBBB 1TB AAAAAAAAAA -> they meet somewhere in the middle, no space wasted -> If all the drives are the same size, then raid5 and raid1 meet immediately in zone "A": <- raid1 fills from the right, raid5 from the left <- 3TB AAAAAAAAAAAAAAAAAAAAAAAAAAA421 3TB AAAAAAAAAAAAAAAAAAAAAAAAAAA431 3TB AAAAAAAAAAAAAAAAAAAAAAAAAAAU32 -> they meet somewhere in the middle, up to 1GB wasted -> There used to be a bug (maybe there still is?) where the allocator would randomly place about 0.1% of chunks on a non-optimal disk (due to a race condition?). That can theoretically lose a few GB of space per TB by shrinking the stripe width on a few block groups, or stealing a mirror chunk from the largest disk in a raid1 array with multiple disk sizes. You can get rid of those using the 'stripes' filter for balance--though only 0.1% of the space is gained or lost this way, so it may not be worth the IO cost. If you are converting or reshaping an array, the nice rules above don't hold any more. e.g. if we replace a 1TB drive with a 3TB drive, we get 2TB unallocated ("_"): 3TB AAAAAAAAAABBBBBBBBBBCCCC654321 3TB AAAAAAAAAABBBBBBBBBBCCCC654321 2TB AAAAAAAAAABBBBBBBBBB 2TB AAAAAAAAAABBBBBBBBBB 3TB AAAAAAAAAA____________________ Now we have no available space because there's no free chunks on two or more drives (i.e. all the free space is on 1 drive and all the RAID profiles we are using require 2). Upgrade another disk, and... 3TB AAAAAAAAAABBBBBBBBBBCCCC654321 3TB AAAAAAAAAABBBBBBBBBBCCCC654321 2TB AAAAAAAAAABBBBBBBBBB 3TB AAAAAAAAAABBBBBBBBBB__________ 3TB AAAAAAAAAA____________________ Now we have 1TB of free space, in stripes 2 disks wide. Without a balance, it would fill up like this: 3TB AAAAAAAAAABBBBBBBBBBCCCC654321 3TB AAAAAAAAAABBBBBBBBBBCCCC654321 2TB AAAAAAAAAABBBBBBBBBB 3TB AAAAAAAAAABBBBBBBBBBCCCCCCCCCC 3TB AAAAAAAAAACCCCCCCCCCXXXXXXXXXX -> raid5 fills smallest unallocated spaces first on all drives -> Note the "C" zone here is still stripes 2 disks wide, so a lot of space is wasted by narrow stripes. Even the diagram makes it look we did something wrong--we don't have the nice orderly fill pattern. 1TB is unusable, and the free space estimated by 'df' was egregiously wrong the whole time. Full balance fixes that, and we get some unallocated space that is usable: -> raid5 from left to right -> 3TB AAAAAAAAAAAAAAAAAAA________531 3TB AAAAAAAAAAAAAAAAAAA________531 2TB AAAAAAAAAAAAAAAAAAA_ 3TB AAAAAAAAAAAAAAAAAAA________642 3TB AAAAAAAAAAAAAAAAAAA________642 <- raid1 from right to left <- which can then be filled up like this: 3TB AAAAAAAAAAAAAAAAAAAABBBBBB7531 3TB AAAAAAAAAAAAAAAAAAAABBBBBB7531 2TB AAAAAAAAAAAAAAAAAAAA 3TB AAAAAAAAAAAAAAAAAAAABBBBBBC642 3TB AAAAAAAAAAAAAAAAAAAABBBBBBC642 By the time our hypothetical filesystem was full, there was another metadata chunk allocated, so we end up with one 1GB block group in zone "C" with 2 disks--but at most one. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Balancing raid5 after adding another disk does not move/use any data on it @ 2019-03-13 21:58 Jakub Husák 2019-03-14 21:31 ` Chris Murphy 0 siblings, 1 reply; 16+ messages in thread From: Jakub Husák @ 2019-03-13 21:58 UTC (permalink / raw) To: linux-btrfs Hi, I added another disk to my 3-disk raid5 and ran a balance command. After few hours I looked to output of `fi usage` to see that no data are being used on the new disk. I got the same result even when balancing my raid5 data or metadata. Next I tried to convert my raid5 metadata to raid1 (a good idea anyway) and the new disk started to fill immediately (even though it received the whole amount of metadata with replicas being spread among the other drives, instead of being really "balanced". I know why this happened, I don't like it but I can live with it, let's not go off topic here :)). Now my usage output looks like this: # btrfs filesystem usage /mnt/data WARNING: RAID56 detected, not implemented Overall: Device size: 10.91TiB Device allocated: 316.12GiB Device unallocated: 10.61TiB Device missing: 0.00B Used: 58.88GiB Free (estimated): 0.00B (min: 8.00EiB) Data ratio: 0.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 184.94MiB) Data,RAID5: Size:4.59TiB, Used:4.06TiB /dev/mapper/crypt-sdb 2.29TiB /dev/mapper/crypt-sdc 2.29TiB /dev/mapper/crypt-sde 2.29TiB Metadata,RAID1: Size:158.00GiB, Used:29.44GiB /dev/mapper/crypt-sdb 53.00GiB /dev/mapper/crypt-sdc 53.00GiB /dev/mapper/crypt-sdd 158.00GiB /dev/mapper/crypt-sde 52.00GiB System,RAID1: Size:64.00MiB, Used:528.00KiB /dev/mapper/crypt-sdc 32.00MiB /dev/mapper/crypt-sdd 64.00MiB /dev/mapper/crypt-sde 32.00MiB Unallocated: /dev/mapper/crypt-sdb 392.04GiB /dev/mapper/crypt-sdc 392.01GiB /dev/mapper/crypt-sdd 2.57TiB /dev/mapper/crypt-sde 393.01GiB I'm now running `fi balance -dusage=10` (and rising the usage limit). I can see that the unallocated space is rising as it's freeing the little used chunks but still no data are being stored on the new disk. I it some bug? Is `fi usage` not showing me something (as it states "WARNING: RAID56 detected, not implemented")? Or is there just too much free space on the first set of disks that the balancing is not bothering moving any data? If so, shouldn't it be really balancing (spreading) the data among all the drives to use all the IOPS capacity, even when the raid5 redundancy constraint is currently satisfied? # uname -a Linux keeper 4.19.0-0.bpo.2-amd64 #1 SMP Debian 4.19.16-1~bpo9+1 (2019-02-07) x86_64 GNU/Linux # btrfs --version btrfs-progs v4.17 # btrfs fi show Label: none uuid: xxxxxxxxxxxxxxxxxxxxxxxxxx Total devices 4 FS bytes used 4.09TiB devid 2 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdc devid 3 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sdb devid 4 size 2.73TiB used 2.34TiB path /dev/mapper/crypt-sde devid 5 size 2.73TiB used 158.06GiB path /dev/mapper/crypt-sdd # btrfs fi df . Data, RAID5: total=4.59TiB, used=4.06TiB System, RAID1: total=64.00MiB, used=528.00KiB Metadata, RAID1: total=158.00GiB, used=29.43GiB GlobalReserve, single: total=512.00MiB, used=0.00B Thanks Jakub ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Balancing raid5 after adding another disk does not move/use any data on it 2019-03-13 21:58 Jakub Husák @ 2019-03-14 21:31 ` Chris Murphy 0 siblings, 0 replies; 16+ messages in thread From: Chris Murphy @ 2019-03-14 21:31 UTC (permalink / raw) To: Jakub Husák; +Cc: Btrfs BTRFS On Wed, Mar 13, 2019 at 3:58 PM Jakub Husák <jakub@husak.pro> wrote: > > Hi, > > I added another disk to my 3-disk raid5 and ran a balance command. What exact commands did you use for the two operations? >After > few hours I looked to output of `fi usage` to see that no data are being > used on the new disk. I got the same result even when balancing my raid5 > data or metadata. > > Next I tried to convert my raid5 metadata to raid1 (a good idea anyway) > and the new disk started to fill immediately (even though it received > the whole amount of metadata with replicas being spread among the other > drives, instead of being really "balanced". I know why this happened, I > don't like it but I can live with it, let's not go off topic here :)). They could be related problems. Unclear. I suggest grabbing btrfs-debugfs from upstream btrfs-progs and run `sudo btrfs-debugfs -b /mntpoint/` and let's see what the block group distribution looks like. https://github.com/kdave/btrfs-progs/blob/master/btrfs-debugfs > > I'm now running `fi balance -dusage=10` (and rising the usage limit). I > can see that the unallocated space is rising as it's freeing the little > used chunks but still no data are being stored on the new disk. > > I it some bug? It's possible, but not enough information. The balance code is complicated. > If so, shouldn't it be really balancing (spreading) the data among all > the drives to use all the IOPS capacity, even when the raid5 redundancy > constraint is currently satisfied? I'd expect that it should copy extents from old 3 strip block groups to new 4 strip block groups. However, there have been some improvements related to block group management and enospc avoidance where existing block groups get filled first, before new block groups are created, and I wonder if that's what's going on here, but it's speculation. What do you get for btrfs insp dump-t -t 5 /dev/ ##device, not mountpoint, will work if fs is mounted but ideally not in-use btrfs insp dump-s -f /dev/ ##same Also, no significant changes in raid56.c between 4.19.16 and 5.0.2. But there have been some volume.c changes. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/fs/btrfs/volumes.c?id=v5.0.2&id2=v4.19.16 Anyway, I would stop making changes for now and make sure your backups are up to date as a top priority. And then it's safer to poke this with a stick and see what's going on and how to get it to cooperate. -- Chris Murphy ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-03-18 19:54 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-03-13 22:11 Balancing raid5 after adding another disk does not move/use any data on it Jakub Husák 2019-03-14 14:59 ` Noah Massey 2019-03-14 15:08 ` Noah Massey 2019-03-15 18:01 ` Zygo Blaxell 2019-03-15 18:42 ` Jakub Husák 2019-03-15 18:59 ` Zygo Blaxell 2019-03-15 20:31 ` Hans van Kranenburg 2019-03-16 6:07 ` Andrei Borzenkov 2019-03-16 16:34 ` Hans van Kranenburg 2019-03-16 19:51 ` Hans van Kranenburg 2019-03-17 20:52 ` Jakub Husák 2019-03-17 22:53 ` Hans van Kranenburg 2019-03-18 19:54 ` Marc Joliet 2019-03-16 23:10 ` Zygo Blaxell -- strict thread matches above, loose matches on Subject: below -- 2019-03-13 21:58 Jakub Husák 2019-03-14 21:31 ` Chris Murphy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox