* RAID-1 refuses to balance large drive
@ 2016-03-23 0:47 Brad Templeton
2016-03-23 4:01 ` Qu Wenruo
0 siblings, 1 reply; 35+ messages in thread
From: Brad Templeton @ 2016-03-23 0:47 UTC (permalink / raw)
To: linux-btrfs
I have a RAID 1, and was running a bit low, so replaced a 2TB drive with
a 6TB. The other drives are a 3TB and a 4TB. After switching the
drive, I did a balance and ... essentially nothing changed. It did not
balance clusters over to the 6TB drive off of the other 2 drives. I
found it odd, and wondered if it would do it as needed, but as time went
on, the filesys got full for real.
Making inquiries on the IRC channel, it was suggested perhaps the drives
were too full for a balance, but they had at least 50gb free I would
estimate, when I swapped. As a test, I added a 4th drive, a spare
20gb partition and did a balance. The balance did indeed balance the 3
small drives, so they now each have 6gb unallocated, but the big drive
remained unchanged. The balance reported it operated on almost all the
clusters, though.
Linux kernel 4.2.0 (Ubuntu Wiley)
Label: 'butter' uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
Total devices 4 FS bytes used 3.88TiB
devid 1 size 3.62TiB used 3.62TiB path /dev/sdi2
devid 2 size 2.73TiB used 2.72TiB path /dev/sdh
devid 3 size 5.43TiB used 1.42TiB path /dev/sdg2
devid 4 size 20.00GiB used 14.00GiB path /dev/sda1
btrfs fi usage /local
Overall:
Device size: 11.81TiB
Device allocated: 7.77TiB
Device unallocated: 4.04TiB
Device missing: 0.00B
Used: 7.76TiB
Free (estimated): 2.02TiB (min: 2.02TiB)
Data ratio: 2.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,RAID1: Size:3.87TiB, Used:3.87TiB
/dev/sda1 14.00GiB
/dev/sdg2 1.41TiB
/dev/sdh 2.72TiB
/dev/sdi2 3.61TiB
Metadata,RAID1: Size:11.00GiB, Used:9.79GiB
/dev/sdg2 5.00GiB
/dev/sdh 7.00GiB
/dev/sdi2 10.00GiB
System,RAID1: Size:32.00MiB, Used:572.00KiB
/dev/sdg2 32.00MiB
/dev/sdi2 32.00MiB
Unallocated:
/dev/sda1 6.00GiB
/dev/sdg2 4.02TiB
/dev/sdh 5.52GiB
/dev/sdi2 7.36GiB
----------------------
btrfs fi df /local
Data, RAID1: total=3.87TiB, used=3.87TiB
System, RAID1: total=32.00MiB, used=572.00KiB
Metadata, RAID1: total=11.00GiB, used=9.79GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
I would have presumed that a balance would take blocks found on both the
3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of
unallocated space. But this does not happen. Any clues on how to make
it happen?
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: RAID-1 refuses to balance large drive 2016-03-23 0:47 RAID-1 refuses to balance large drive Brad Templeton @ 2016-03-23 4:01 ` Qu Wenruo 2016-03-23 4:47 ` Brad Templeton 0 siblings, 1 reply; 35+ messages in thread From: Qu Wenruo @ 2016-03-23 4:01 UTC (permalink / raw) To: bradtem, linux-btrfs Brad Templeton wrote on 2016/03/22 17:47 -0700: > I have a RAID 1, and was running a bit low, so replaced a 2TB drive with > a 6TB. The other drives are a 3TB and a 4TB. After switching the > drive, I did a balance and ... essentially nothing changed. It did not > balance clusters over to the 6TB drive off of the other 2 drives. I > found it odd, and wondered if it would do it as needed, but as time went > on, the filesys got full for real. Did you resized the replaced deivces to max? Without resize, btrfs still consider it can only use 2T of the 6T devices. Thanks, Qu > > Making inquiries on the IRC channel, it was suggested perhaps the drives > were too full for a balance, but they had at least 50gb free I would > estimate, when I swapped. As a test, I added a 4th drive, a spare > 20gb partition and did a balance. The balance did indeed balance the 3 > small drives, so they now each have 6gb unallocated, but the big drive > remained unchanged. The balance reported it operated on almost all the > clusters, though. > > Linux kernel 4.2.0 (Ubuntu Wiley) > > Label: 'butter' uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438 > Total devices 4 FS bytes used 3.88TiB > devid 1 size 3.62TiB used 3.62TiB path /dev/sdi2 > devid 2 size 2.73TiB used 2.72TiB path /dev/sdh > devid 3 size 5.43TiB used 1.42TiB path /dev/sdg2 > devid 4 size 20.00GiB used 14.00GiB path /dev/sda1 > > btrfs fi usage /local > > Overall: > Device size: 11.81TiB > Device allocated: 7.77TiB > Device unallocated: 4.04TiB > Device missing: 0.00B > Used: 7.76TiB > Free (estimated): 2.02TiB (min: 2.02TiB) > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,RAID1: Size:3.87TiB, Used:3.87TiB > /dev/sda1 14.00GiB > /dev/sdg2 1.41TiB > /dev/sdh 2.72TiB > /dev/sdi2 3.61TiB > > Metadata,RAID1: Size:11.00GiB, Used:9.79GiB > /dev/sdg2 5.00GiB > /dev/sdh 7.00GiB > /dev/sdi2 10.00GiB > > System,RAID1: Size:32.00MiB, Used:572.00KiB > /dev/sdg2 32.00MiB > /dev/sdi2 32.00MiB > > Unallocated: > /dev/sda1 6.00GiB > /dev/sdg2 4.02TiB > /dev/sdh 5.52GiB > /dev/sdi2 7.36GiB > > ---------------------- > btrfs fi df /local > Data, RAID1: total=3.87TiB, used=3.87TiB > System, RAID1: total=32.00MiB, used=572.00KiB > Metadata, RAID1: total=11.00GiB, used=9.79GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > I would have presumed that a balance would take blocks found on both the > 3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of > unallocated space. But this does not happen. Any clues on how to make > it happen? > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 4:01 ` Qu Wenruo @ 2016-03-23 4:47 ` Brad Templeton 2016-03-23 5:42 ` Chris Murphy 0 siblings, 1 reply; 35+ messages in thread From: Brad Templeton @ 2016-03-23 4:47 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs That's rather counter intuitive behaviour. In most FSs, resizes are needed when you do things like change the size of an underlying partition, or you weren't using all the partition. When you add one drive with device add, and you then remove another with device delete, why and how would the added device know to size itself to the device that you are planning to delete? Ie. I don't see how it could know (you add the new drive before even telling it you want to remove the old one) and I also can't see a reason it would not use all the drive you tell it to add. In any event, I did a btrfs fi resize 3:max /local on the 6TB as you suggest, and have another balance running but it appears like all the others to be doing nothing, though of course it will take hours. Are you sure it works that way? Even before the resize, as you see below, it indicates the volume is 6TB with 4TB of unallocated space. It is only the df that says full (and the fact that there is no unallocated space on the 3TB and 4TB drives.) On 03/22/2016 09:01 PM, Qu Wenruo wrote: > > > Brad Templeton wrote on 2016/03/22 17:47 -0700: >> I have a RAID 1, and was running a bit low, so replaced a 2TB drive with >> a 6TB. The other drives are a 3TB and a 4TB. After switching the >> drive, I did a balance and ... essentially nothing changed. It did not >> balance clusters over to the 6TB drive off of the other 2 drives. I >> found it odd, and wondered if it would do it as needed, but as time went >> on, the filesys got full for real. > > Did you resized the replaced deivces to max? > Without resize, btrfs still consider it can only use 2T of the 6T devices. > > Thanks, > Qu > >> >> Making inquiries on the IRC channel, it was suggested perhaps the drives >> were too full for a balance, but they had at least 50gb free I would >> estimate, when I swapped. As a test, I added a 4th drive, a spare >> 20gb partition and did a balance. The balance did indeed balance the 3 >> small drives, so they now each have 6gb unallocated, but the big drive >> remained unchanged. The balance reported it operated on almost all the >> clusters, though. >> >> Linux kernel 4.2.0 (Ubuntu Wiley) >> >> Label: 'butter' uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438 >> Total devices 4 FS bytes used 3.88TiB >> devid 1 size 3.62TiB used 3.62TiB path /dev/sdi2 >> devid 2 size 2.73TiB used 2.72TiB path /dev/sdh >> devid 3 size 5.43TiB used 1.42TiB path /dev/sdg2 >> devid 4 size 20.00GiB used 14.00GiB path /dev/sda1 >> >> btrfs fi usage /local >> >> Overall: >> Device size: 11.81TiB >> Device allocated: 7.77TiB >> Device unallocated: 4.04TiB >> Device missing: 0.00B >> Used: 7.76TiB >> Free (estimated): 2.02TiB (min: 2.02TiB) >> Data ratio: 2.00 >> Metadata ratio: 2.00 >> Global reserve: 512.00MiB (used: 0.00B) >> >> Data,RAID1: Size:3.87TiB, Used:3.87TiB >> /dev/sda1 14.00GiB >> /dev/sdg2 1.41TiB >> /dev/sdh 2.72TiB >> /dev/sdi2 3.61TiB >> >> Metadata,RAID1: Size:11.00GiB, Used:9.79GiB >> /dev/sdg2 5.00GiB >> /dev/sdh 7.00GiB >> /dev/sdi2 10.00GiB >> >> System,RAID1: Size:32.00MiB, Used:572.00KiB >> /dev/sdg2 32.00MiB >> /dev/sdi2 32.00MiB >> >> Unallocated: >> /dev/sda1 6.00GiB >> /dev/sdg2 4.02TiB >> /dev/sdh 5.52GiB >> /dev/sdi2 7.36GiB >> >> ---------------------- >> btrfs fi df /local >> Data, RAID1: total=3.87TiB, used=3.87TiB >> System, RAID1: total=32.00MiB, used=572.00KiB >> Metadata, RAID1: total=11.00GiB, used=9.79GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> >> I would have presumed that a balance would take blocks found on both the >> 3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of >> unallocated space. But this does not happen. Any clues on how to make >> it happen? >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 4:47 ` Brad Templeton @ 2016-03-23 5:42 ` Chris Murphy [not found] ` <56F22F80.501@gmail.com> 0 siblings, 1 reply; 35+ messages in thread From: Chris Murphy @ 2016-03-23 5:42 UTC (permalink / raw) To: bradtem; +Cc: Qu Wenruo, Btrfs BTRFS On Tue, Mar 22, 2016 at 10:47 PM, Brad Templeton <bradtem@gmail.com> wrote: > That's rather counter intuitive behaviour. In most FSs, resizes are > needed when you do things like change the size of an underlying > partition, or you weren't using all the partition. When you add one > drive with device add, and you then remove another with device delete, > why and how would the added device know to size itself to the device > that you are planning to delete? Ie. I don't see how it could know > (you add the new drive before even telling it you want to remove the old > one) and I also can't see a reason it would not use all the drive you > tell it to add. > > In any event, I did a btrfs fi resize 3:max /local on the 6TB as you > suggest, and have another balance running but it appears like all the > others to be doing nothing, though of course it will take hours. Are > you sure it works that way? Even before the resize, as you see below, > it indicates the volume is 6TB with 4TB of unallocated space. It is > only the df that says full (and the fact that there is no unallocated > space on the 3TB and 4TB drives.) It does work that way and I agree off hand that the lack of automatically doing a resize to max is counter intuitive. I'd think the user has implicitly set the size they want by handing over the device to Btrfs, be it a whole device, partition or LV. There might be some notes in the mail archive and possibly comments in btrfs-progs that explains the logic. devid 1 size 3.62TiB used 3.62TiB path /dev/sdi2 devid 2 size 2.73TiB used 2.72TiB path /dev/sdh devid 3 size 5.43TiB used 1.42TiB path /dev/sdg Also note that after a successful balance this will not be evenly allocated because device sizes aren't even. Simplistically it'll do something like this: copy 1 chunks on devid3 and copy 2 chunks on devid1 until the free space on devid1 is equal to free space on devid2. And then it'll start alternating copy 2 chunks between devid1 and 2, while copy 1 chunks continue to write on devid3. That happens until free space on all three is equal, and then allocation alternates among all three to try to maintain approximately equal free space remaining. You might find this helpful: http://carfax.org.uk/btrfs-usage/ -- Chris Murphy ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <56F22F80.501@gmail.com>]
* Re: RAID-1 refuses to balance large drive [not found] ` <56F22F80.501@gmail.com> @ 2016-03-23 6:17 ` Chris Murphy 2016-03-23 16:51 ` Brad Templeton 0 siblings, 1 reply; 35+ messages in thread From: Chris Murphy @ 2016-03-23 6:17 UTC (permalink / raw) To: bradtem, Btrfs BTRFS, Qu Wenruo On Tue, Mar 22, 2016 at 11:54 PM, Brad Templeton <bradtem@gmail.com> wrote: > Actually, the URL suggests that all the space will be used, which is > what I had read about btrfs, that it handled this. It will. But it does this by dominating writes to the devices that have the most free space, until all devices have the same free space. > But again, how could it possibly know to restrict the new device to only > using 2TB? In your case, before resizing it, it's just inheriting the size from the device being replaced. > > Stage one: Add the new 6TB device. The 2TB device is still present. > > Stage two: Remove the 2TB device. OK this is confusing. In your first post you said replaced. That suggests you used 'btrfs replace start' rather than 'btrfs device add' followed by 'btrfs device remove'. So which did you do? If you did the latter, then there's no resize necessary. > The system copies everything on it > to the device which has the most space, the empty 6TB device. But you > are saying it decides to _shrink_ the 6TB device now that we know it is > a 2TB device being removed? No I'm not. The source of confusion appears to be that you're unfamiliar with 'btrfs replace' so you mean 'dev add' followed by 'dev remove' to mean replaced. This line: devid 3 size 5.43TiB used 1.42TiB path /dev/sdg2 suggests it's using the entire 6TB of the newly added drive, it's already at max size. > We didn't know the 2TB would be removed > when we added the 6TB, so I just can't fathom why the code would do > that. In addition, the stats I get back say it didn't do that. I don't understand the first part. Whether you asked for 'dev remove' or you used 'replace' both of those mean removing some device. You have to specify the device to be removed. Now might be a good time to actually write out the exact commands you've used. > > More to the point, after the resize, the balance is still not changing > any size numbers. It should be moving blocks to the most empty device, > should it not? There is almost no space on devids 1 and 2, so it > would not copy any chunks there. > > I'm starting to think this is a bug, but I'll keep plugging. Could be a bug. Three drive raid1 of different sizes is somewhat uncommon so it's possible it's hit an edge case somehow. Qu will know more about how to find out why it's not allocating mostly to the larger drive. The eventual work around might end up being to convert data chunks to single, then convert back to raid1. But before doing that it'd be better to find out why it's not doing the right thing the normal way. -- Chris Murphy ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 6:17 ` Chris Murphy @ 2016-03-23 16:51 ` Brad Templeton 2016-03-23 18:34 ` Chris Murphy 0 siblings, 1 reply; 35+ messages in thread From: Brad Templeton @ 2016-03-23 16:51 UTC (permalink / raw) To: Chris Murphy, Btrfs BTRFS, Qu Wenruo Thanks for assist. To reiterate what I said in private: a) I am fairly sure I swapped drives by adding the 6TB drive and then removing the 2TB drive, which would not have made the 6TB think it was only 2TB. The btrfs statistics commands have shown from the beginning the size of the device as 6TB, and that after the remove, it haad 4TB unallocated. b) Even if my memory is wrong and I did a replace (that's not even documented in the wiki page on multiple device so I did not think I had heard of it) I have since does a resize to "max" on all devices, and still the balance moves nothing. It says it processes almost all the blocks it sees, but nothing changes. So I am looking for other options, or if people have commands I might execute to diagnose this (as it seems to be a flaw in balance) let me know. Some options remaining open to me: a) I could re-add the 2TB device, which is still there. Then balance again, which hopefully would move a lot of stuff. Then remove it again and hopefully the new stuff would distribute mostly to the large drive. Then I could try balance again. b) It was suggested I could (with a good backup) convert the drive to non-RAID1 to free up tons of space and then re-convert. What's the precise procedure for that? Perhaps I can do it with a limit to see how it works as an experiment? Any way to specifically target the blocks that have their two copies on the 2 smaller drives for conversion? c) Finally, I could take a full-full backup (my normal backups don't bother with cached stuff and certain other things that you can recover) and take the system down for a while to just wipe and restore the volumes. That doesn't find the bug, however. On 03/22/2016 11:17 PM, Chris Murphy wrote: > On Tue, Mar 22, 2016 at 11:54 PM, Brad Templeton <bradtem@gmail.com> wrote: >> Actually, the URL suggests that all the space will be used, which is >> what I had read about btrfs, that it handled this. > > It will. But it does this by dominating writes to the devices that > have the most free space, until all devices have the same free space. > > >> But again, how could it possibly know to restrict the new device to only >> using 2TB? > > In your case, before resizing it, it's just inheriting the size from > the device being replaced. > >> >> Stage one: Add the new 6TB device. The 2TB device is still present. >> >> Stage two: Remove the 2TB device. > > OK this is confusing. In your first post you said replaced. That > suggests you used 'btrfs replace start' rather than 'btrfs device add' > followed by 'btrfs device remove'. So which did you do? > > If you did the latter, then there's no resize necessary. > > >> The system copies everything on it >> to the device which has the most space, the empty 6TB device. But you >> are saying it decides to _shrink_ the 6TB device now that we know it is >> a 2TB device being removed? > > No I'm not. The source of confusion appears to be that you're > unfamiliar with 'btrfs replace' so you mean 'dev add' followed by 'dev > remove' to mean replaced. > > This line: > devid 3 size 5.43TiB used 1.42TiB path /dev/sdg2 > > suggests it's using the entire 6TB of the newly added drive, it's > already at max size. > > >> We didn't know the 2TB would be removed >> when we added the 6TB, so I just can't fathom why the code would do >> that. In addition, the stats I get back say it didn't do that. > > I don't understand the first part. Whether you asked for 'dev remove' > or you used 'replace' both of those mean removing some device. You > have to specify the device to be removed. > > Now might be a good time to actually write out the exact commands you've used. > > >> >> More to the point, after the resize, the balance is still not changing >> any size numbers. It should be moving blocks to the most empty device, >> should it not? There is almost no space on devids 1 and 2, so it >> would not copy any chunks there. >> >> I'm starting to think this is a bug, but I'll keep plugging. > > Could be a bug. Three drive raid1 of different sizes is somewhat > uncommon so it's possible it's hit an edge case somehow. Qu will know > more about how to find out why it's not allocating mostly to the > larger drive. The eventual work around might end up being to convert > data chunks to single, then convert back to raid1. But before doing > that it'd be better to find out why it's not doing the right thing the > normal way. > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 16:51 ` Brad Templeton @ 2016-03-23 18:34 ` Chris Murphy 2016-03-23 19:10 ` Brad Templeton ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Chris Murphy @ 2016-03-23 18:34 UTC (permalink / raw) To: Brad Templeton; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton <bradtem@gmail.com> wrote: > Thanks for assist. To reiterate what I said in private: > > a) I am fairly sure I swapped drives by adding the 6TB drive and then > removing the 2TB drive, which would not have made the 6TB think it was > only 2TB. The btrfs statistics commands have shown from the beginning > the size of the device as 6TB, and that after the remove, it haad 4TB > unallocated. I agree this seems to be consistent with what's been reported. > > So I am looking for other options, or if people have commands I might > execute to diagnose this (as it seems to be a flaw in balance) let me know. What version of btrfs-progs is this? I'm vaguely curious what 'btrfs check' reports (without --repair). Any version is OK but it's better to use something fairly recent since the check code continues to change a lot. Another thing you could try is a newer kernel. Maybe there's a related bug in 4.2.0. I think it may be more likely this is just an edge case bug that's always been there, but it's valuable to know if recent kernels exhibit the problem. And before proceeding with a change in layout (converting to another profile) I suggest taking an image of the metadata with btrfs-image, it might come in handy for a developer. > > Some options remaining open to me: > > a) I could re-add the 2TB device, which is still there. Then balance > again, which hopefully would move a lot of stuff. Then remove it again > and hopefully the new stuff would distribute mostly to the large drive. > Then I could try balance again. Yeah, to do this will require -f to wipe the signature info from that drive when you add it. But I don't think this is a case of needing more free space, I think it might be due to the odd number of drives that are also fairly different in size. But then what happens when you delete the 2TB drive after the balance? Do you end up right back in this same situation? > > b) It was suggested I could (with a good backup) convert the drive to > non-RAID1 to free up tons of space and then re-convert. What's the > precise procedure for that? Perhaps I can do it with a limit to see how > it works as an experiment? Any way to specifically target the blocks > that have their two copies on the 2 smaller drives for conversion? btrfs balance -dconvert=single -mconvert=single -f ## you have to use -f to force reduction in redundancy btrfs balance -dconvert=raid1 -mconvert=raid1 There is the devid= filter but I'm not sure of the consequences of limiting the conversion to two of three devices, that's kinda confusing and is sufficiently an edge case I wonder how many bugs you're looking to find today? :-) > c) Finally, I could take a full-full backup (my normal backups don't > bother with cached stuff and certain other things that you can recover) > and take the system down for a while to just wipe and restore the > volumes. That doesn't find the bug, however. I'd have the full backup no matter what choice you make. At any time for any reason any filesystem can face plant without warning. But yes this should definitely work or else you've definitely found a bug. Finding the bug in your current scenario is harder because the history of this volume makes it really non-deterministic whereas if you start with a 3 disk volume at mkfs time, and then you reproduce this problem, for sure it's a bug. And fairly straightforward to reproduce. I still recommend a newer kernel and progs though, just because there's no work being done on 4.2 anymore. I suggest 4.4.6 and 4.4.1 progs. And then if you reproduce it, it's not just a bug, it's a current bug. -- Chris Murphy ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 18:34 ` Chris Murphy @ 2016-03-23 19:10 ` Brad Templeton 2016-03-23 19:27 ` Alexander Fougner ` (2 more replies) 2016-03-23 22:28 ` Duncan 2016-03-24 7:08 ` Andrew Vaughan 2 siblings, 3 replies; 35+ messages in thread From: Brad Templeton @ 2016-03-23 19:10 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4. I will upgrade to Xenial in April but probably not before, I don't have days to spend on this. Is there a fairly safe ppa to pull 4.4 or 4.5? In olden days, I would patch and build my kernels from source but I just don't have time for all the long-term sysadmin burden that creates any more. Also, I presume if this is a bug, it's in btrfsprogs, though the new one presumably needs a newer kernel too. I am surprised to hear it said that having the mixed sizes is an odd case. That was actually one of the more compelling features of btrfs that made me switch from mdadm, lvm and the rest. I presumed most people were the same. You need more space, you go out and buy a new drive and of course the new drive is bigger than the old drives you bought because they always get bigger. Under mdadm the bigger drive still helped, because it replaced at smaller drive, the one that was holding the RAID back, but you didn't get to use all the big drive until a year later when you had upgraded them all. In the meantime you used the extra space in other RAIDs. (For example, a raid-5 plus a raid-1 on the 2 bigger drives) Or you used the extra space as non-RAID space, ie. space for static stuff that has offline backups. In fact, most of my storage is of that class (photo archives, reciprocal backups of other systems) where RAID is not needed. So the long story is, I think most home users are likely to always have different sizes and want their FS to treat it well. Since 6TB is a relatively new size, I wonder if that plays a role. More than 4TB of free space to balance into, could that confuse it? Off to do a backup (good idea anyway.) On 03/23/2016 11:34 AM, Chris Murphy wrote: > On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton <bradtem@gmail.com> wrote: >> Thanks for assist. To reiterate what I said in private: >> >> a) I am fairly sure I swapped drives by adding the 6TB drive and then >> removing the 2TB drive, which would not have made the 6TB think it was >> only 2TB. The btrfs statistics commands have shown from the beginning >> the size of the device as 6TB, and that after the remove, it haad 4TB >> unallocated. > > I agree this seems to be consistent with what's been reported. > > >> >> So I am looking for other options, or if people have commands I might >> execute to diagnose this (as it seems to be a flaw in balance) let me know. > > What version of btrfs-progs is this? I'm vaguely curious what 'btrfs > check' reports (without --repair). Any version is OK but it's better > to use something fairly recent since the check code continues to > change a lot. > > Another thing you could try is a newer kernel. Maybe there's a related > bug in 4.2.0. I think it may be more likely this is just an edge case > bug that's always been there, but it's valuable to know if recent > kernels exhibit the problem. > > And before proceeding with a change in layout (converting to another > profile) I suggest taking an image of the metadata with btrfs-image, > it might come in handy for a developer. > > > >> >> Some options remaining open to me: >> >> a) I could re-add the 2TB device, which is still there. Then balance >> again, which hopefully would move a lot of stuff. Then remove it again >> and hopefully the new stuff would distribute mostly to the large drive. >> Then I could try balance again. > > Yeah, to do this will require -f to wipe the signature info from that > drive when you add it. But I don't think this is a case of needing > more free space, I think it might be due to the odd number of drives > that are also fairly different in size. > > But then what happens when you delete the 2TB drive after the balance? > Do you end up right back in this same situation? > > > >> >> b) It was suggested I could (with a good backup) convert the drive to >> non-RAID1 to free up tons of space and then re-convert. What's the >> precise procedure for that? Perhaps I can do it with a limit to see how >> it works as an experiment? Any way to specifically target the blocks >> that have their two copies on the 2 smaller drives for conversion? > > btrfs balance -dconvert=single -mconvert=single -f ## you have to > use -f to force reduction in redundancy > btrfs balance -dconvert=raid1 -mconvert=raid1 > > There is the devid= filter but I'm not sure of the consequences of > limiting the conversion to two of three devices, that's kinda > confusing and is sufficiently an edge case I wonder how many bugs > you're looking to find today? :-) > > > >> c) Finally, I could take a full-full backup (my normal backups don't >> bother with cached stuff and certain other things that you can recover) >> and take the system down for a while to just wipe and restore the >> volumes. That doesn't find the bug, however. > > I'd have the full backup no matter what choice you make. At any time > for any reason any filesystem can face plant without warning. > > But yes this should definitely work or else you've definitely found a > bug. Finding the bug in your current scenario is harder because the > history of this volume makes it really non-deterministic whereas if > you start with a 3 disk volume at mkfs time, and then you reproduce > this problem, for sure it's a bug. And fairly straightforward to > reproduce. > > I still recommend a newer kernel and progs though, just because > there's no work being done on 4.2 anymore. I suggest 4.4.6 and 4.4.1 > progs. And then if you reproduce it, it's not just a bug, it's a > current bug. > > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 19:10 ` Brad Templeton @ 2016-03-23 19:27 ` Alexander Fougner 2016-03-23 19:33 ` Chris Murphy 2016-03-23 21:54 ` Duncan 2 siblings, 0 replies; 35+ messages in thread From: Alexander Fougner @ 2016-03-23 19:27 UTC (permalink / raw) To: bradtem; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo 2016-03-23 20:10 GMT+01:00 Brad Templeton <bradtem@gmail.com>: > It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4. I will upgrade to > Xenial in April but probably not before, I don't have days to spend on > this. Is there a fairly safe ppa to pull 4.4 or 4.5? Use the mainline ppa: http://kernel.ubuntu.com/~kernel-ppa/mainline/ Instructions: https://wiki.ubuntu.com/Kernel/MainlineBuilds You can also find a newer btrfs-progs .deb here: launchpad.net/ubuntu/+source/btrfs-tools In olden days, I > would patch and build my kernels from source but I just don't have time > for all the long-term sysadmin burden that creates any more. > > Also, I presume if this is a bug, it's in btrfsprogs, though the new one > presumably needs a newer kernel too. > > I am surprised to hear it said that having the mixed sizes is an odd > case. That was actually one of the more compelling features of btrfs > that made me switch from mdadm, lvm and the rest. I presumed most > people were the same. You need more space, you go out and buy a new > drive and of course the new drive is bigger than the old drives you > bought because they always get bigger. Under mdadm the bigger drive > still helped, because it replaced at smaller drive, the one that was > holding the RAID back, but you didn't get to use all the big drive until > a year later when you had upgraded them all. In the meantime you used > the extra space in other RAIDs. (For example, a raid-5 plus a raid-1 on > the 2 bigger drives) Or you used the extra space as non-RAID space, ie. > space for static stuff that has offline backups. In fact, most of my > storage is of that class (photo archives, reciprocal backups of other > systems) where RAID is not needed. > > So the long story is, I think most home users are likely to always have > different sizes and want their FS to treat it well. > > Since 6TB is a relatively new size, I wonder if that plays a role. More > than 4TB of free space to balance into, could that confuse it? > > Off to do a backup (good idea anyway.) > > > > On 03/23/2016 11:34 AM, Chris Murphy wrote: >> On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton <bradtem@gmail.com> wrote: >>> Thanks for assist. To reiterate what I said in private: >>> >>> a) I am fairly sure I swapped drives by adding the 6TB drive and then >>> removing the 2TB drive, which would not have made the 6TB think it was >>> only 2TB. The btrfs statistics commands have shown from the beginning >>> the size of the device as 6TB, and that after the remove, it haad 4TB >>> unallocated. >> >> I agree this seems to be consistent with what's been reported. >> >> >>> >>> So I am looking for other options, or if people have commands I might >>> execute to diagnose this (as it seems to be a flaw in balance) let me know. >> >> What version of btrfs-progs is this? I'm vaguely curious what 'btrfs >> check' reports (without --repair). Any version is OK but it's better >> to use something fairly recent since the check code continues to >> change a lot. >> >> Another thing you could try is a newer kernel. Maybe there's a related >> bug in 4.2.0. I think it may be more likely this is just an edge case >> bug that's always been there, but it's valuable to know if recent >> kernels exhibit the problem. >> >> And before proceeding with a change in layout (converting to another >> profile) I suggest taking an image of the metadata with btrfs-image, >> it might come in handy for a developer. >> >> >> >>> >>> Some options remaining open to me: >>> >>> a) I could re-add the 2TB device, which is still there. Then balance >>> again, which hopefully would move a lot of stuff. Then remove it again >>> and hopefully the new stuff would distribute mostly to the large drive. >>> Then I could try balance again. >> >> Yeah, to do this will require -f to wipe the signature info from that >> drive when you add it. But I don't think this is a case of needing >> more free space, I think it might be due to the odd number of drives >> that are also fairly different in size. >> >> But then what happens when you delete the 2TB drive after the balance? >> Do you end up right back in this same situation? >> >> >> >>> >>> b) It was suggested I could (with a good backup) convert the drive to >>> non-RAID1 to free up tons of space and then re-convert. What's the >>> precise procedure for that? Perhaps I can do it with a limit to see how >>> it works as an experiment? Any way to specifically target the blocks >>> that have their two copies on the 2 smaller drives for conversion? >> >> btrfs balance -dconvert=single -mconvert=single -f ## you have to >> use -f to force reduction in redundancy >> btrfs balance -dconvert=raid1 -mconvert=raid1 >> >> There is the devid= filter but I'm not sure of the consequences of >> limiting the conversion to two of three devices, that's kinda >> confusing and is sufficiently an edge case I wonder how many bugs >> you're looking to find today? :-) >> >> >> >>> c) Finally, I could take a full-full backup (my normal backups don't >>> bother with cached stuff and certain other things that you can recover) >>> and take the system down for a while to just wipe and restore the >>> volumes. That doesn't find the bug, however. >> >> I'd have the full backup no matter what choice you make. At any time >> for any reason any filesystem can face plant without warning. >> >> But yes this should definitely work or else you've definitely found a >> bug. Finding the bug in your current scenario is harder because the >> history of this volume makes it really non-deterministic whereas if >> you start with a 3 disk volume at mkfs time, and then you reproduce >> this problem, for sure it's a bug. And fairly straightforward to >> reproduce. >> >> I still recommend a newer kernel and progs though, just because >> there's no work being done on 4.2 anymore. I suggest 4.4.6 and 4.4.1 >> progs. And then if you reproduce it, it's not just a bug, it's a >> current bug. >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 19:10 ` Brad Templeton 2016-03-23 19:27 ` Alexander Fougner @ 2016-03-23 19:33 ` Chris Murphy 2016-03-24 1:59 ` Qu Wenruo 2016-03-25 13:16 ` Patrik Lundquist 2016-03-23 21:54 ` Duncan 2 siblings, 2 replies; 35+ messages in thread From: Chris Murphy @ 2016-03-23 19:33 UTC (permalink / raw) To: Brad Templeton; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> wrote: > It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4. I will upgrade to > Xenial in April but probably not before, I don't have days to spend on > this. Is there a fairly safe ppa to pull 4.4 or 4.5? I'm not sure. In olden days, I > would patch and build my kernels from source but I just don't have time > for all the long-term sysadmin burden that creates any more. > > Also, I presume if this is a bug, it's in btrfsprogs, though the new one > presumably needs a newer kernel too. No you can mix and match progs and kernel versions. You just don't get new features if you don't have a new kernel. But the issue is the balance code is all in the kernel. It's activated by user space tools but it's all actually done by kernel code. > I am surprised to hear it said that having the mixed sizes is an odd > case. Not odd as in wrong, just uncommon compared to other arrangements being tested. > That was actually one of the more compelling features of btrfs > that made me switch from mdadm, lvm and the rest. I presumed most > people were the same. You need more space, you go out and buy a new > drive and of course the new drive is bigger than the old drives you > bought because they always get bigger. Of course and I'm not saying it shouldn't work. The central problem here is we don't even know what the problem really is; we only know the manifestation of the problem isn't the desired or expected outcome. And how to find out the cause is different than how to fix it. > Under mdadm the bigger drive > still helped, because it replaced at smaller drive, the one that was > holding the RAID back, but you didn't get to use all the big drive until > a year later when you had upgraded them all. In the meantime you used > the extra space in other RAIDs. (For example, a raid-5 plus a raid-1 on > the 2 bigger drives) Or you used the extra space as non-RAID space, ie. > space for static stuff that has offline backups. In fact, most of my > storage is of that class (photo archives, reciprocal backups of other > systems) where RAID is not needed. > > So the long story is, I think most home users are likely to always have > different sizes and want their FS to treat it well. Yes of course. And at the expense of getting a frownie face.... "Btrfs is under heavy development, and is not suitable for any uses other than benchmarking and review." https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt Despite that disclosure, what you're describing is not what I'd expect and not what I've previously experienced. But I haven't had three different sized drives, and they weren't particularly full, and I don't know if you started with three from the outset at mkfs time or if this is the result of two drives with a third added on later, etc. So the nature of file systems is actually really complicated and it's normal for there to be regressions - and maybe this is a regression, hard to say with available information. > Since 6TB is a relatively new size, I wonder if that plays a role. More > than 4TB of free space to balance into, could that confuse it? Seems unlikely. -- Chris Murphy ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 19:33 ` Chris Murphy @ 2016-03-24 1:59 ` Qu Wenruo 2016-03-24 2:13 ` Brad Templeton 2016-03-25 13:16 ` Patrik Lundquist 1 sibling, 1 reply; 35+ messages in thread From: Qu Wenruo @ 2016-03-24 1:59 UTC (permalink / raw) To: Chris Murphy, Brad Templeton; +Cc: Btrfs BTRFS Chris Murphy wrote on 2016/03/23 13:33 -0600: > On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> wrote: >> It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4. I will upgrade to >> Xenial in April but probably not before, I don't have days to spend on >> this. Is there a fairly safe ppa to pull 4.4 or 4.5? > > I'm not sure. > > > In olden days, I >> would patch and build my kernels from source but I just don't have time >> for all the long-term sysadmin burden that creates any more. >> >> Also, I presume if this is a bug, it's in btrfsprogs, though the new one >> presumably needs a newer kernel too. > > No you can mix and match progs and kernel versions. You just don't get > new features if you don't have a new kernel. > > But the issue is the balance code is all in the kernel. It's activated > by user space tools but it's all actually done by kernel code. > > > >> I am surprised to hear it said that having the mixed sizes is an odd >> case. > > Not odd as in wrong, just uncommon compared to other arrangements being tested. > >> That was actually one of the more compelling features of btrfs >> that made me switch from mdadm, lvm and the rest. I presumed most >> people were the same. You need more space, you go out and buy a new >> drive and of course the new drive is bigger than the old drives you >> bought because they always get bigger. > > Of course and I'm not saying it shouldn't work. The central problem > here is we don't even know what the problem really is; we only know > the manifestation of the problem isn't the desired or expected > outcome. And how to find out the cause is different than how to fix > it. About chunk allocation problem, I hope to get a clear view of the whole disk layout now. What's the final disk layout? Is that 4T + 3T + 6T + 20G layout? If so, I'll say, in that case, only fully re-convert to single may help. As there is no enough space to allocate new raid1 chunks for balance them all. Chris Murphy may have already mentioned, btrfs chunk allocation has some limitation, although it is already more flex than mdadm. Btrfs chunk allocation will choose the device with most unallocated, and for raid1, it will ensure always pick 2 different devices to allocation. This allocation does make btrfs raid1 allocation more space in a more flex method than mdadm raid1. But that only works if you start from scratch. I'll explain it that case first. 1) 6T and 4T devices only stage: Allocate 1T Raid1 chunk. As 6T and 4T devices have the most unallocated space, so the first 1T raid chunk will be allocated from them. Remaining space: 3/3/5 2) 6T and 3/4 switch stage: Allocate 4T Raid1 chunk. After stage 1), we have 3/3/5 remaining space, then btrfs will pick space from 5T remaining(6T devices), and switch between the other 3T remaining one. Cause the remaining space to be 1/1/1. 3) Fake-even allocation stage: Allocate 1T raid chunk. Now all devices have the same unallocated space, and there are 3 devices, we can't really balance all chunks across them. As we must and will only select 2 devices, in this stage, there will be 1T unallocated and never be used. After all, you will get 1 +4 +1 = 6T, still smaller than (3 + 4 +6 ) /2 = 6.5T Now let's talk about your 3 + 4 + 6 case. For your initial state, 3 and 4 T devices is already filled up. Even your 6T device have about 4T available space, it's only 1 device, not 2 which raid1 needs. So, no space for balance to allocate a new raid chunk. The extra 20G is so small that almost makes no sence. The convert to single then back to raid1, will do its job partly. But according to other report from mail list. The result won't be perfect even, even the reporter uses devices with all same size. So to conclude: 1) Btrfs will use most of devices space for raid1. 2) 1) only happens if one fills btrfs from scratch 3) For already filled case, convert to single then convert back will work, but not perfectly. Thanks, Qu > > > >> Under mdadm the bigger drive >> still helped, because it replaced at smaller drive, the one that was >> holding the RAID back, but you didn't get to use all the big drive until >> a year later when you had upgraded them all. In the meantime you used >> the extra space in other RAIDs. (For example, a raid-5 plus a raid-1 on >> the 2 bigger drives) Or you used the extra space as non-RAID space, ie. >> space for static stuff that has offline backups. In fact, most of my >> storage is of that class (photo archives, reciprocal backups of other >> systems) where RAID is not needed. >> >> So the long story is, I think most home users are likely to always have >> different sizes and want their FS to treat it well. > > Yes of course. And at the expense of getting a frownie face.... > > "Btrfs is under heavy development, and is not suitable for > any uses other than benchmarking and review." > https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt > > Despite that disclosure, what you're describing is not what I'd expect > and not what I've previously experienced. But I haven't had three > different sized drives, and they weren't particularly full, and I > don't know if you started with three from the outset at mkfs time or > if this is the result of two drives with a third added on later, etc. > So the nature of file systems is actually really complicated and it's > normal for there to be regressions - and maybe this is a regression, > hard to say with available information. > > > >> Since 6TB is a relatively new size, I wonder if that plays a role. More >> than 4TB of free space to balance into, could that confuse it? > > Seems unlikely. > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-24 1:59 ` Qu Wenruo @ 2016-03-24 2:13 ` Brad Templeton 2016-03-24 2:33 ` Qu Wenruo 0 siblings, 1 reply; 35+ messages in thread From: Brad Templeton @ 2016-03-24 2:13 UTC (permalink / raw) To: Qu Wenruo, Chris Murphy; +Cc: Btrfs BTRFS On 03/23/2016 06:59 PM, Qu Wenruo wrote: > > About chunk allocation problem, I hope to get a clear view of the whole > disk layout now. > > What's the final disk layout? > Is that 4T + 3T + 6T + 20G layout? > > If so, I'll say, in that case, only fully re-convert to single may help. > As there is no enough space to allocate new raid1 chunks for balance > them all. > > > Chris Murphy may have already mentioned, btrfs chunk allocation has some > limitation, although it is already more flex than mdadm. > > > Btrfs chunk allocation will choose the device with most unallocated, and > for raid1, it will ensure always pick 2 different devices to allocation. > > This allocation does make btrfs raid1 allocation more space in a more > flex method than mdadm raid1. > But that only works if you start from scratch. > > I'll explain it that case first. > > 1) 6T and 4T devices only stage: Allocate 1T Raid1 chunk. > As 6T and 4T devices have the most unallocated space, so the first > 1T raid chunk will be allocated from them. > Remaining space: 3/3/5 This stage never existed. We had a 4 + 3 + 2 stage, which was low-ish on space but not full. I mean it had hundreds of gb free. Then we had 4 + 3 + 6 + 2, but did not add more files or balance. Then we had a remove of the 2, which caused, as expected, all the chunks on the 2TB drive to be copied to the 6TB drive, as it was the most empty drive. Then we had a balance. The balance (I would have expected) would have moved chunks found on both 3 and 4, taking one of them and moving it to the 6. Generally alternating taking ones from the 3 and 4. I can see no reason this should not work even if 3 and 4 are almost entirely full, but they were not. But this did not happen. > > 2) 6T and 3/4 switch stage: Allocate 4T Raid1 chunk. > After stage 1), we have 3/3/5 remaining space, then btrfs will pick > space from 5T remaining(6T devices), and switch between the other 3T > remaining one. > > Cause the remaining space to be 1/1/1. > > 3) Fake-even allocation stage: Allocate 1T raid chunk. > Now all devices have the same unallocated space, and there are 3 > devices, we can't really balance all chunks across them. > As we must and will only select 2 devices, in this stage, there will > be 1T unallocated and never be used. > > After all, you will get 1 +4 +1 = 6T, still smaller than (3 + 4 +6 ) /2 > = 6.5T > > Now let's talk about your 3 + 4 + 6 case. > > For your initial state, 3 and 4 T devices is already filled up. > Even your 6T device have about 4T available space, it's only 1 device, > not 2 which raid1 needs. > > So, no space for balance to allocate a new raid chunk. The extra 20G is > so small that almost makes no sence. Yes, it was added as an experiment on the suggestion of somebody on the IRC channel. I will be rid of it soon. Still, it seems to me that the lack of space even after I filled the disks should not interfere with the balance's ability to move chunks which are found on both 3 and 4 so that one remains and one goes to the 6. This action needs no spare space. Now I presume the current algorithm perhaps does not work this way? My next plan is to add the 2tb back. If I am right, balance will move chunks from 3 and 4 to the 2TB, but it should not move any from the 6TB because it has so much space. LIkewise, when I re-remove the 2tb, all its chunks should move to the 6tb, and I will be at least in a usable state. Or is the single approach faster? > > > The convert to single then back to raid1, will do its job partly. > But according to other report from mail list. > The result won't be perfect even, even the reporter uses devices with > all same size. > > > So to conclude: > > 1) Btrfs will use most of devices space for raid1. > 2) 1) only happens if one fills btrfs from scratch > 3) For already filled case, convert to single then convert back will > work, but not perfectly. > > Thanks, > Qu > >> >> >> >>> Under mdadm the bigger drive >>> still helped, because it replaced at smaller drive, the one that was >>> holding the RAID back, but you didn't get to use all the big drive until >>> a year later when you had upgraded them all. In the meantime you used >>> the extra space in other RAIDs. (For example, a raid-5 plus a raid-1 on >>> the 2 bigger drives) Or you used the extra space as non-RAID space, ie. >>> space for static stuff that has offline backups. In fact, most of my >>> storage is of that class (photo archives, reciprocal backups of other >>> systems) where RAID is not needed. >>> >>> So the long story is, I think most home users are likely to always have >>> different sizes and want their FS to treat it well. >> >> Yes of course. And at the expense of getting a frownie face.... >> >> "Btrfs is under heavy development, and is not suitable for >> any uses other than benchmarking and review." >> https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt >> >> Despite that disclosure, what you're describing is not what I'd expect >> and not what I've previously experienced. But I haven't had three >> different sized drives, and they weren't particularly full, and I >> don't know if you started with three from the outset at mkfs time or >> if this is the result of two drives with a third added on later, etc. >> So the nature of file systems is actually really complicated and it's >> normal for there to be regressions - and maybe this is a regression, >> hard to say with available information. >> >> >> >>> Since 6TB is a relatively new size, I wonder if that plays a role. More >>> than 4TB of free space to balance into, could that confuse it? >> >> Seems unlikely. >> >> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-24 2:13 ` Brad Templeton @ 2016-03-24 2:33 ` Qu Wenruo 2016-03-24 2:49 ` Brad Templeton 0 siblings, 1 reply; 35+ messages in thread From: Qu Wenruo @ 2016-03-24 2:33 UTC (permalink / raw) To: bradtem, Chris Murphy; +Cc: Btrfs BTRFS Brad Templeton wrote on 2016/03/23 19:13 -0700: > > > On 03/23/2016 06:59 PM, Qu Wenruo wrote: > >> >> About chunk allocation problem, I hope to get a clear view of the whole >> disk layout now. >> >> What's the final disk layout? >> Is that 4T + 3T + 6T + 20G layout? >> >> If so, I'll say, in that case, only fully re-convert to single may help. >> As there is no enough space to allocate new raid1 chunks for balance >> them all. >> >> >> Chris Murphy may have already mentioned, btrfs chunk allocation has some >> limitation, although it is already more flex than mdadm. >> >> >> Btrfs chunk allocation will choose the device with most unallocated, and >> for raid1, it will ensure always pick 2 different devices to allocation. >> >> This allocation does make btrfs raid1 allocation more space in a more >> flex method than mdadm raid1. >> But that only works if you start from scratch. >> >> I'll explain it that case first. >> >> 1) 6T and 4T devices only stage: Allocate 1T Raid1 chunk. >> As 6T and 4T devices have the most unallocated space, so the first >> 1T raid chunk will be allocated from them. >> Remaining space: 3/3/5 > > This stage never existed. We had a 4 + 3 + 2 stage, which was low-ish > on space but not full. I mean it had hundreds of gb free. The stage I talked about is only for you fill btrfs from scratch, with 3 4 6 devices. Just as an example to explain how btrfs allocated space on un-even devices. > > Then we had 4 + 3 + 6 + 2, but did not add more files or balance. > > Then we had a remove of the 2, which caused, as expected, all the chunks > on the 2TB drive to be copied to the 6TB drive, as it was the most empty > drive. > > Then we had a balance. The balance (I would have expected) would have > moved chunks found on both 3 and 4, taking one of them and moving it to > the 6. Generally alternating taking ones from the 3 and 4. I can see > no reason this should not work even if 3 and 4 are almost entirely full, > but they were not. > But this did not happen. > >> >> 2) 6T and 3/4 switch stage: Allocate 4T Raid1 chunk. >> After stage 1), we have 3/3/5 remaining space, then btrfs will pick >> space from 5T remaining(6T devices), and switch between the other 3T >> remaining one. >> >> Cause the remaining space to be 1/1/1. >> >> 3) Fake-even allocation stage: Allocate 1T raid chunk. >> Now all devices have the same unallocated space, and there are 3 >> devices, we can't really balance all chunks across them. >> As we must and will only select 2 devices, in this stage, there will >> be 1T unallocated and never be used. >> >> After all, you will get 1 +4 +1 = 6T, still smaller than (3 + 4 +6 ) /2 >> = 6.5T >> >> Now let's talk about your 3 + 4 + 6 case. >> >> For your initial state, 3 and 4 T devices is already filled up. >> Even your 6T device have about 4T available space, it's only 1 device, >> not 2 which raid1 needs. >> >> So, no space for balance to allocate a new raid chunk. The extra 20G is >> so small that almost makes no sence. > > Yes, it was added as an experiment on the suggestion of somebody on the > IRC channel. I will be rid of it soon. Still, it seems to me that the > lack of space even after I filled the disks should not interfere with > the balance's ability to move chunks which are found on both 3 and 4 so > that one remains and one goes to the 6. This action needs no spare > space. Now I presume the current algorithm perhaps does not work this way? No, balance is not working like that. Although most user consider balance is moving data, which is partly right. The fact is, balance is, copy-and-delete. And it needs spare space. Means you must have enough space for the extents you are balancing, then btrfs will copy them, update reference, and then delete old data (with its block group). So for balancing data in already filled device, btrfs needs to find space for them first. Which will need 2 devices with unallocated space for RAID1. And in you case, you only have 1 devices with unallocated space, so no space to balance. > > My next plan is to add the 2tb back. If I am right, balance will move > chunks from 3 and 4 to the 2TB, Not only to 2TB, but to 2TB and 6TB. Never forgot that RAID1 needs 2 devices. And if 2TB is filled and 3/4 and free space, it's also possible to 3/4 devices. That will free 2TB in already filled up devices. But that's still not enough to get space even. You may need to balance several times(maybe 10+) to make space a little even, as balance won't balance any chunk which is created by balance. (Or balance will loop infinitely). > but it should not move any from the 6TB > because it has so much space. That's also wrong. Whether balance will move data from 6TB devices, is only determined by if the src chunk has stripe on 6TB devices and there is enough space to copy them to. Balance, unlike chunk allocation, is much simple and no complicated space calculation. 1) Check current chunk If the chunk is out of chunk range (beyond last chunk, which means we are done and current chunk is newly created one) then we finish balance. 2) Check if we have enough space for current chunk. Including creating new chunks. 3) Copy all exntets in this chunk to new location 4) Update reference of all extents to point to new location And free old extents. 5) Goto next chunk.(bytenr order) So, it's possible that some data in 6TB devices is moved to 6TB again, or to the empty 2TB devices. It's chunk allocator which ensure the new chunk (destination chunk) is allocated from 6T and empty 2T devices. > LIkewise, when I re-remove the 2tb, all > its chunks should move to the 6tb, and I will be at least in a usable state. > > Or is the single approach faster? As mentioned, not that easy. The 2Tb devices is not the silver bullet at all. Re-convert method is the preferred one, although it's not perfect. Thanks, Qu > >> >> >> The convert to single then back to raid1, will do its job partly. >> But according to other report from mail list. >> The result won't be perfect even, even the reporter uses devices with >> all same size. >> >> >> So to conclude: >> >> 1) Btrfs will use most of devices space for raid1. >> 2) 1) only happens if one fills btrfs from scratch >> 3) For already filled case, convert to single then convert back will >> work, but not perfectly. >> >> Thanks, >> Qu >> >>> >>> >>> >>>> Under mdadm the bigger drive >>>> still helped, because it replaced at smaller drive, the one that was >>>> holding the RAID back, but you didn't get to use all the big drive until >>>> a year later when you had upgraded them all. In the meantime you used >>>> the extra space in other RAIDs. (For example, a raid-5 plus a raid-1 on >>>> the 2 bigger drives) Or you used the extra space as non-RAID space, ie. >>>> space for static stuff that has offline backups. In fact, most of my >>>> storage is of that class (photo archives, reciprocal backups of other >>>> systems) where RAID is not needed. >>>> >>>> So the long story is, I think most home users are likely to always have >>>> different sizes and want their FS to treat it well. >>> >>> Yes of course. And at the expense of getting a frownie face.... >>> >>> "Btrfs is under heavy development, and is not suitable for >>> any uses other than benchmarking and review." >>> https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt >>> >>> Despite that disclosure, what you're describing is not what I'd expect >>> and not what I've previously experienced. But I haven't had three >>> different sized drives, and they weren't particularly full, and I >>> don't know if you started with three from the outset at mkfs time or >>> if this is the result of two drives with a third added on later, etc. >>> So the nature of file systems is actually really complicated and it's >>> normal for there to be regressions - and maybe this is a regression, >>> hard to say with available information. >>> >>> >>> >>>> Since 6TB is a relatively new size, I wonder if that plays a role. More >>>> than 4TB of free space to balance into, could that confuse it? >>> >>> Seems unlikely. >>> >>> >> > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-24 2:33 ` Qu Wenruo @ 2016-03-24 2:49 ` Brad Templeton 2016-03-24 3:44 ` Chris Murphy ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Brad Templeton @ 2016-03-24 2:49 UTC (permalink / raw) To: Qu Wenruo, Chris Murphy; +Cc: Btrfs BTRFS On 03/23/2016 07:33 PM, Qu Wenruo wrote: > > The stage I talked about is only for you fill btrfs from scratch, with 3 > 4 6 devices. > > Just as an example to explain how btrfs allocated space on un-even devices. > >> >> Then we had 4 + 3 + 6 + 2, but did not add more files or balance. >> >> Then we had a remove of the 2, which caused, as expected, all the chunks >> on the 2TB drive to be copied to the 6TB drive, as it was the most empty >> drive. >> >> Then we had a balance. The balance (I would have expected) would have >> moved chunks found on both 3 and 4, taking one of them and moving it to >> the 6. Generally alternating taking ones from the 3 and 4. I can see >> no reason this should not work even if 3 and 4 are almost entirely full, >> but they were not. >> But this did not happen. >> >>> >>> 2) 6T and 3/4 switch stage: Allocate 4T Raid1 chunk. >>> After stage 1), we have 3/3/5 remaining space, then btrfs will pick >>> space from 5T remaining(6T devices), and switch between the other 3T >>> remaining one. >>> >>> Cause the remaining space to be 1/1/1. >>> >>> 3) Fake-even allocation stage: Allocate 1T raid chunk. >>> Now all devices have the same unallocated space, and there are 3 >>> devices, we can't really balance all chunks across them. >>> As we must and will only select 2 devices, in this stage, there will >>> be 1T unallocated and never be used. >>> >>> After all, you will get 1 +4 +1 = 6T, still smaller than (3 + 4 +6 ) /2 >>> = 6.5T >>> >>> Now let's talk about your 3 + 4 + 6 case. >>> >>> For your initial state, 3 and 4 T devices is already filled up. >>> Even your 6T device have about 4T available space, it's only 1 device, >>> not 2 which raid1 needs. >>> >>> So, no space for balance to allocate a new raid chunk. The extra 20G is >>> so small that almost makes no sence. >> >> Yes, it was added as an experiment on the suggestion of somebody on the >> IRC channel. I will be rid of it soon. Still, it seems to me that the >> lack of space even after I filled the disks should not interfere with >> the balance's ability to move chunks which are found on both 3 and 4 so >> that one remains and one goes to the 6. This action needs no spare >> space. Now I presume the current algorithm perhaps does not work >> this way? > > No, balance is not working like that. > Although most user consider balance is moving data, which is partly right. > The fact is, balance is, copy-and-delete. And it needs spare space. > > Means you must have enough space for the extents you are balancing, then > btrfs will copy them, update reference, and then delete old data (with > its block group). > > So for balancing data in already filled device, btrfs needs to find > space for them first. > Which will need 2 devices with unallocated space for RAID1. > > And in you case, you only have 1 devices with unallocated space, so no > space to balance. Ah. I would class this as a bug, or at least a non-optimal design. If I understand, you say it tries to move both of the matching chunks to new homes. This makes no sense if there are 3 drives because it is assured that one chunk is staying on the same drive. Even with 4 or more drives, where this could make sense, in fact it would still be wise to attempt to move only one of the pair of chunks, and then move the other if that is also a good idea. > > >> >> My next plan is to add the 2tb back. If I am right, balance will move >> chunks from 3 and 4 to the 2TB, > > Not only to 2TB, but to 2TB and 6TB. Never forgot that RAID1 needs 2 > devices. > And if 2TB is filled and 3/4 and free space, it's also possible to 3/4 > devices. > > That will free 2TB in already filled up devices. But that's still not > enough to get space even. > > You may need to balance several times(maybe 10+) to make space a little > even, as balance won't balance any chunk which is created by balance. > (Or balance will loop infinitely). Now I understand -- I had not thought it would try to move 2 when that's so obviously wrong on a 3-drive, and so I was not thinking of the general case. So I can now calculate that if I add the 2TB, in an ideal situation, it will perhaps get 1TB of chunks and the 6TB will get 1TB of chunks and then the 4 drives will have 3 with 1TB free, and the 6TB will have 3TB free. Then when I remove the 2TB, the 6TB should get all its chunks and will have 2TB free and the other two 1TB free and that's actually the right situation as all new blocks will appear on the 6TB and one of the other two drives. I don't want to keep 4 drives because small drives consume power for little, better to move them to other purposes (offline backup etc.) In the algorithm below, does "chunk" refer to both the redundant copies of the data, or just to one of them? I am guessing my misunderstanding may come from it referring to both, and moving both? The ability of it to move within the same device you describe is presumably there for combining things together to a chunk, but it appears it slows down the drive rebalancing plan. Thanks for your explanations. > >> but it should not move any from the 6TB >> because it has so much space. > > That's also wrong. > Whether balance will move data from 6TB devices, is only determined by > if the src chunk has stripe on 6TB devices and there is enough space to > copy them to. > > Balance, unlike chunk allocation, is much simple and no complicated > space calculation. > > 1) Check current chunk > If the chunk is out of chunk range (beyond last chunk, which means > we are done and current chunk is newly created one) > then we finish balance. > > 2) Check if we have enough space for current chunk. > Including creating new chunks. > > 3) Copy all exntets in this chunk to new location > > 4) Update reference of all extents to point to new location > And free old extents. > > 5) Goto next chunk.(bytenr order) > > So, it's possible that some data in 6TB devices is moved to 6TB again, > or to the empty 2TB devices. > > It's chunk allocator which ensure the new chunk (destination chunk) is > allocated from 6T and empty 2T devices. > >> LIkewise, when I re-remove the 2tb, all >> its chunks should move to the 6tb, and I will be at least in a usable >> state. >> >> Or is the single approach faster? > > As mentioned, not that easy. The 2Tb devices is not the silver bullet at > all. > > Re-convert method is the preferred one, although it's not perfect. > > Thanks, > Qu >> >>> >>> >>> The convert to single then back to raid1, will do its job partly. >>> But according to other report from mail list. >>> The result won't be perfect even, even the reporter uses devices with >>> all same size. >>> >>> >>> So to conclude: >>> >>> 1) Btrfs will use most of devices space for raid1. >>> 2) 1) only happens if one fills btrfs from scratch >>> 3) For already filled case, convert to single then convert back will >>> work, but not perfectly. >>> >>> Thanks, >>> Qu >>> >>>> >>>> >>>> >>>>> Under mdadm the bigger drive >>>>> still helped, because it replaced at smaller drive, the one that was >>>>> holding the RAID back, but you didn't get to use all the big drive >>>>> until >>>>> a year later when you had upgraded them all. In the meantime you used >>>>> the extra space in other RAIDs. (For example, a raid-5 plus a >>>>> raid-1 on >>>>> the 2 bigger drives) Or you used the extra space as non-RAID space, >>>>> ie. >>>>> space for static stuff that has offline backups. In fact, most of my >>>>> storage is of that class (photo archives, reciprocal backups of other >>>>> systems) where RAID is not needed. >>>>> >>>>> So the long story is, I think most home users are likely to always >>>>> have >>>>> different sizes and want their FS to treat it well. >>>> >>>> Yes of course. And at the expense of getting a frownie face.... >>>> >>>> "Btrfs is under heavy development, and is not suitable for >>>> any uses other than benchmarking and review." >>>> https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt >>>> >>>> Despite that disclosure, what you're describing is not what I'd expect >>>> and not what I've previously experienced. But I haven't had three >>>> different sized drives, and they weren't particularly full, and I >>>> don't know if you started with three from the outset at mkfs time or >>>> if this is the result of two drives with a third added on later, etc. >>>> So the nature of file systems is actually really complicated and it's >>>> normal for there to be regressions - and maybe this is a regression, >>>> hard to say with available information. >>>> >>>> >>>> >>>>> Since 6TB is a relatively new size, I wonder if that plays a role. >>>>> More >>>>> than 4TB of free space to balance into, could that confuse it? >>>> >>>> Seems unlikely. >>>> >>>> >>> >> >> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-24 2:49 ` Brad Templeton @ 2016-03-24 3:44 ` Chris Murphy 2016-03-24 3:46 ` Qu Wenruo 2016-03-24 6:11 ` Duncan 2 siblings, 0 replies; 35+ messages in thread From: Chris Murphy @ 2016-03-24 3:44 UTC (permalink / raw) To: Brad Templeton; +Cc: Qu Wenruo, Chris Murphy, Btrfs BTRFS On Wed, Mar 23, 2016 at 8:49 PM, Brad Templeton <bradtem@gmail.com> wrote: > On 03/23/2016 07:33 PM, Qu Wenruo wrote: >> >> No, balance is not working like that. >> Although most user consider balance is moving data, which is partly right. >> The fact is, balance is, copy-and-delete. And it needs spare space. >> >> Means you must have enough space for the extents you are balancing, then >> btrfs will copy them, update reference, and then delete old data (with >> its block group). >> >> So for balancing data in already filled device, btrfs needs to find >> space for them first. >> Which will need 2 devices with unallocated space for RAID1. >> >> And in you case, you only have 1 devices with unallocated space, so no >> space to balance. > > Ah. I would class this as a bug, or at least a non-optimal design. If > I understand, you say it tries to move both of the matching chunks to > new homes. This makes no sense if there are 3 drives because it is > assured that one chunk is staying on the same drive. Even with 4 or > more drives, where this could make sense, in fact it would still be wise > to attempt to move only one of the pair of chunks, and then move the > other if that is also a good idea. In a separate thread, it's observed that balance code is getting complicated and it's probably important that it not be too smart for itself. The thing to understand is that chunks are a contiguous range of physical sectors. What's really being copied are extents in those chunks. And the balance not only rewrites extents but it tries to collect them together to efficiently use the chunk space. The Btrfs chunk isn't like an md chunk. > > >> >> >>> >>> My next plan is to add the 2tb back. If I am right, balance will move >>> chunks from 3 and 4 to the 2TB, >> >> Not only to 2TB, but to 2TB and 6TB. Never forgot that RAID1 needs 2 >> devices. >> And if 2TB is filled and 3/4 and free space, it's also possible to 3/4 >> devices. >> >> That will free 2TB in already filled up devices. But that's still not >> enough to get space even. >> >> You may need to balance several times(maybe 10+) to make space a little >> even, as balance won't balance any chunk which is created by balance. >> (Or balance will loop infinitely). > > Now I understand -- I had not thought it would try to move 2 when that's > so obviously wrong on a 3-drive, and so I was not thinking of the > general case. So I can now calculate that if I add the 2TB, in an ideal > situation, it will perhaps get 1TB of chunks and the 6TB will get 1TB of > chunks and then the 4 drives will have 3 with 1TB free, and the 6TB will > have 3TB free. The problem is that you have two devices totally full now, devid1 and devid2. So it's not certain it's going to start just copying chunks off those drives. Whatever it does, it does on both chunk copies. It might be moving them. It might be packing them more efficiently with extents. No deallocation of a chunk can happen until it's empty. So for two full drives it's difficult to see how this gets fixed just with a regular balance. I think you have to go to single profile... OR... Add the 2TB. Remove the 6TB and wait. devid 3 size 5.43TiB used 1.42TiB path /dev/sdg2 this suggests 1.4TiB on the 6TB drive so it should be possible for those chunks to get moved to the 2TB drive. Now you have an empty 6TB, and you still have a (very full) raid1 with all data. mkfs a new volume on the 6TB, btrfs send/receive to get all data on the 6TB drive. "Data,RAID1: Size:3.87TiB, Used:3.87TiB" suggests only 4TB data so the 6TB can hold all of it. Now you can umount the old volume; and you can force add 3TB and 4TB to the new 6TB volume, and -dconvert=raid1 -mconvert=raid1 The worse case scenario is the the 6TB drive dies during the conversion and then it could be totally broken and you have to go to backup. But otherwise, it's a bit less risky than two balances to and from single profile across three or even four drives. -- Chris Murphy ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-24 2:49 ` Brad Templeton 2016-03-24 3:44 ` Chris Murphy @ 2016-03-24 3:46 ` Qu Wenruo 2016-03-24 6:11 ` Duncan 2 siblings, 0 replies; 35+ messages in thread From: Qu Wenruo @ 2016-03-24 3:46 UTC (permalink / raw) To: bradtem, Chris Murphy; +Cc: Btrfs BTRFS Brad Templeton wrote on 2016/03/23 19:49 -0700: > > > On 03/23/2016 07:33 PM, Qu Wenruo wrote: > >> >> The stage I talked about is only for you fill btrfs from scratch, with 3 >> 4 6 devices. >> >> Just as an example to explain how btrfs allocated space on un-even devices. >> >>> >>> Then we had 4 + 3 + 6 + 2, but did not add more files or balance. >>> >>> Then we had a remove of the 2, which caused, as expected, all the chunks >>> on the 2TB drive to be copied to the 6TB drive, as it was the most empty >>> drive. >>> >>> Then we had a balance. The balance (I would have expected) would have >>> moved chunks found on both 3 and 4, taking one of them and moving it to >>> the 6. Generally alternating taking ones from the 3 and 4. I can see >>> no reason this should not work even if 3 and 4 are almost entirely full, >>> but they were not. >>> But this did not happen. >>> >>>> >>>> 2) 6T and 3/4 switch stage: Allocate 4T Raid1 chunk. >>>> After stage 1), we have 3/3/5 remaining space, then btrfs will pick >>>> space from 5T remaining(6T devices), and switch between the other 3T >>>> remaining one. >>>> >>>> Cause the remaining space to be 1/1/1. >>>> >>>> 3) Fake-even allocation stage: Allocate 1T raid chunk. >>>> Now all devices have the same unallocated space, and there are 3 >>>> devices, we can't really balance all chunks across them. >>>> As we must and will only select 2 devices, in this stage, there will >>>> be 1T unallocated and never be used. >>>> >>>> After all, you will get 1 +4 +1 = 6T, still smaller than (3 + 4 +6 ) /2 >>>> = 6.5T >>>> >>>> Now let's talk about your 3 + 4 + 6 case. >>>> >>>> For your initial state, 3 and 4 T devices is already filled up. >>>> Even your 6T device have about 4T available space, it's only 1 device, >>>> not 2 which raid1 needs. >>>> >>>> So, no space for balance to allocate a new raid chunk. The extra 20G is >>>> so small that almost makes no sence. >>> >>> Yes, it was added as an experiment on the suggestion of somebody on the >>> IRC channel. I will be rid of it soon. Still, it seems to me that the >>> lack of space even after I filled the disks should not interfere with >>> the balance's ability to move chunks which are found on both 3 and 4 so >>> that one remains and one goes to the 6. This action needs no spare >>> space. Now I presume the current algorithm perhaps does not work >>> this way? >> >> No, balance is not working like that. >> Although most user consider balance is moving data, which is partly right. >> The fact is, balance is, copy-and-delete. And it needs spare space. >> >> Means you must have enough space for the extents you are balancing, then >> btrfs will copy them, update reference, and then delete old data (with >> its block group). >> >> So for balancing data in already filled device, btrfs needs to find >> space for them first. >> Which will need 2 devices with unallocated space for RAID1. >> >> And in you case, you only have 1 devices with unallocated space, so no >> space to balance. > > Ah. I would class this as a bug, or at least a non-optimal design. If > I understand, you say it tries to move both of the matching chunks to > new homes. This makes no sense if there are 3 drives because it is > assured that one chunk is staying on the same drive. Even with 4 or > more drives, where this could make sense, in fact it would still be wise > to attempt to move only one of the pair of chunks, and then move the > other if that is also a good idea. For only one of the pair of chunk, you mean a stripe of a chunk. And in that case, IIRC only replace is doing like that. In most case, btrfs do in chunk unit, which means that may move data inside a device. Even in that case, it's still useful. For example, there is a chunk(1G size) which only contains 1 extent(4K). Such balance can move the 4K extent into an existing chunk, and free the whole 1G chunk to allow new chunk to be created. Considering balance is not only for making chunk allocation even, but also for a lot of other use, IMHO the behavior can hardly called as a bug. > > >> >> >>> >>> My next plan is to add the 2tb back. If I am right, balance will move >>> chunks from 3 and 4 to the 2TB, >> >> Not only to 2TB, but to 2TB and 6TB. Never forgot that RAID1 needs 2 >> devices. >> And if 2TB is filled and 3/4 and free space, it's also possible to 3/4 >> devices. >> >> That will free 2TB in already filled up devices. But that's still not >> enough to get space even. >> >> You may need to balance several times(maybe 10+) to make space a little >> even, as balance won't balance any chunk which is created by balance. >> (Or balance will loop infinitely). > > Now I understand -- I had not thought it would try to move 2 when that's > so obviously wrong on a 3-drive, and so I was not thinking of the > general case. So I can now calculate that if I add the 2TB, in an ideal > situation, it will perhaps get 1TB of chunks and the 6TB will get 1TB of > chunks and then the 4 drives will have 3 with 1TB free, and the 6TB will > have 3TB free. Then when I remove the 2TB, the 6TB should get all its > chunks and will have 2TB free and the other two 1TB free and that's > actually the right situation as all new blocks will appear on the 6TB > and one of the other two drives. > > I don't want to keep 4 drives because small drives consume power for > little, better to move them to other purposes (offline backup etc.) > > In the algorithm below, does "chunk" refer to both the redundant copies > of the data, or just to one of them? Both, or more specifically, the logical data itself. The copy is normally called stripe of the chunk. In raid1 case, all the 2 stripes are just the same of the chunk contents. In btrfs' view(logical address space), Btrfs only cares which chunk covers which bytenr range. This makes a lot things easier. Like (0~1M range is never covered by any chunk) Logical bytenr: 0 1G 2G 3G 4G |<-Chunk 1->|<-Chunk 2->|<-Chunk 3->| Then how each chunk mapped to devices only needs chunk tree to consider. Most part of btrfs only need to care about the logical address space. In chunk tree, it records how chunk is mapped into real devices. Chunk1: type RAID1|DATA, length 1G stripe 0 dev1, dev bytenr XXXX stripe 1 dev2, dev bytenr YYYY Chunk2: type RAID1|METADATA, length 1G stripe 0 dev2, dev bytenr ZZZZ stripe 1 dev3, dev bytenr WWWW And what balance do, is to move all extents(if possible) inside a chunk to another place. Maybe a new chunk, or an old chunk with enough space. For example, after balancing chunk1, btrfs creates a new chunk, chunk4. Copy some extents inside chunk1 to chunk 4, some to chunk 2 and 3. However stripes of chunk4 can still be in dev1 and dev2, although bytenr must changed. 0 1G 2G 3G 4G 5G | |<-Chunk 2->|<-Chunk 3->|<-Chunk 4->| Chunk 4: Type RAID1|DATA length 1G stripe 0 dev1, dev bytenr Some new BYTENR stripe 0 dev2, dev bytenr Some new BYTENR > I am guessing my misunderstanding > may come from it referring to both, and moving both? It's common to consider balance as moving data, and some times the idea of "moving" leads to misunderstanding. > > The ability of it to move within the same device you describe is > presumably there for combining things together to a chunk, but it > appears it slows down the drive rebalancing plan. Personally speaking, the fastest plan is to create a 6T + 6T btrfs raid, and copy all data from old raid to them. And only add devices in pair of same size to that raid. No need to ever bother balancing (mostly). Balance is never as fast as normal copy, unfortunately. Thanks, Qu > > Thanks for your explanations. >> >>> but it should not move any from the 6TB >>> because it has so much space. >> >> That's also wrong. >> Whether balance will move data from 6TB devices, is only determined by >> if the src chunk has stripe on 6TB devices and there is enough space to >> copy them to. >> >> Balance, unlike chunk allocation, is much simple and no complicated >> space calculation. >> >> 1) Check current chunk >> If the chunk is out of chunk range (beyond last chunk, which means >> we are done and current chunk is newly created one) >> then we finish balance. >> >> 2) Check if we have enough space for current chunk. >> Including creating new chunks. >> >> 3) Copy all exntets in this chunk to new location >> >> 4) Update reference of all extents to point to new location >> And free old extents. >> >> 5) Goto next chunk.(bytenr order) >> >> So, it's possible that some data in 6TB devices is moved to 6TB again, >> or to the empty 2TB devices. >> >> It's chunk allocator which ensure the new chunk (destination chunk) is >> allocated from 6T and empty 2T devices. >> >>> LIkewise, when I re-remove the 2tb, all >>> its chunks should move to the 6tb, and I will be at least in a usable >>> state. >>> >>> Or is the single approach faster? >> >> As mentioned, not that easy. The 2Tb devices is not the silver bullet at >> all. >> >> Re-convert method is the preferred one, although it's not perfect. >> >> Thanks, >> Qu >>> >>>> >>>> >>>> The convert to single then back to raid1, will do its job partly. >>>> But according to other report from mail list. >>>> The result won't be perfect even, even the reporter uses devices with >>>> all same size. >>>> >>>> >>>> So to conclude: >>>> >>>> 1) Btrfs will use most of devices space for raid1. >>>> 2) 1) only happens if one fills btrfs from scratch >>>> 3) For already filled case, convert to single then convert back will >>>> work, but not perfectly. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> >>>>> >>>>>> Under mdadm the bigger drive >>>>>> still helped, because it replaced at smaller drive, the one that was >>>>>> holding the RAID back, but you didn't get to use all the big drive >>>>>> until >>>>>> a year later when you had upgraded them all. In the meantime you used >>>>>> the extra space in other RAIDs. (For example, a raid-5 plus a >>>>>> raid-1 on >>>>>> the 2 bigger drives) Or you used the extra space as non-RAID space, >>>>>> ie. >>>>>> space for static stuff that has offline backups. In fact, most of my >>>>>> storage is of that class (photo archives, reciprocal backups of other >>>>>> systems) where RAID is not needed. >>>>>> >>>>>> So the long story is, I think most home users are likely to always >>>>>> have >>>>>> different sizes and want their FS to treat it well. >>>>> >>>>> Yes of course. And at the expense of getting a frownie face.... >>>>> >>>>> "Btrfs is under heavy development, and is not suitable for >>>>> any uses other than benchmarking and review." >>>>> https://www.kernel.org/doc/Documentation/filesystems/btrfs.txt >>>>> >>>>> Despite that disclosure, what you're describing is not what I'd expect >>>>> and not what I've previously experienced. But I haven't had three >>>>> different sized drives, and they weren't particularly full, and I >>>>> don't know if you started with three from the outset at mkfs time or >>>>> if this is the result of two drives with a third added on later, etc. >>>>> So the nature of file systems is actually really complicated and it's >>>>> normal for there to be regressions - and maybe this is a regression, >>>>> hard to say with available information. >>>>> >>>>> >>>>> >>>>>> Since 6TB is a relatively new size, I wonder if that plays a role. >>>>>> More >>>>>> than 4TB of free space to balance into, could that confuse it? >>>>> >>>>> Seems unlikely. >>>>> >>>>> >>>> >>> >>> >> > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-24 2:49 ` Brad Templeton 2016-03-24 3:44 ` Chris Murphy 2016-03-24 3:46 ` Qu Wenruo @ 2016-03-24 6:11 ` Duncan 2 siblings, 0 replies; 35+ messages in thread From: Duncan @ 2016-03-24 6:11 UTC (permalink / raw) To: linux-btrfs Brad Templeton posted on Wed, 23 Mar 2016 19:49:00 -0700 as excerpted: > On 03/23/2016 07:33 PM, Qu Wenruo wrote: > >>> Still, it seems to me >>> that the lack of space even after I filled the disks should not >>> interfere with the balance's ability to move chunks which are found on >>> both 3 and 4 so that one remains and one goes to the 6. This action >>> needs no spare space. Now I presume the current algorithm perhaps >>> does not work this way? >> >> No, balance is not working like that. >> Although most user consider balance is moving data, which is partly >> right. The fact is, balance is, copy-and-delete. And it needs spare >> space. >> >> Means you must have enough space for the extents you are balancing, >> then btrfs will copy them, update reference, and then delete old data >> (with its block group). >> >> So for balancing data in already filled device, btrfs needs to find >> space for them first. >> Which will need 2 devices with unallocated space for RAID1. >> >> And in you case, you only have 1 devices with unallocated space, so no >> space to balance. > > Ah. I would class this as a bug, or at least a non-optimal design. If > I understand, you say it tries to move both of the matching chunks to > new homes. This makes no sense if there are 3 drives because it is > assured that one chunk is staying on the same drive. Even with 4 or > more drives, where this could make sense, in fact it would still be wise > to attempt to move only one of the pair of chunks, and then move the > other if that is also a good idea. What balance does, at its most basic, is rewrite and in the process manipulate chunks in some desired way, depending on the filters used, if any. Once the chunks have been rewritten, the old copies are deleted. But existing chunks are never simply left in place unless the filters exclude them entirely. If they are rewritten, a new chunk is created and the old chunk is removed. Now one of the simplest and most basic effects of this rewrite process is that where two or more chunks of the same type (typically data or metadata) are only partially full, the rewrite process will create a new chunk and start writing, filling it until it is full, then creating another and filling it, etc, which ends up compacting chunks as it rewrites them. So if there's ten chunks and average of 50% full, it'll compact that into five chunks, 100% full. The usage filter is very helpful here, letting you tell balance to only bother with chunks that are under say 10% (usage=10) full, where you'll get a pretty big effect for the effort, as 10 such chunks can be consolidated into one. Of course that would only happen if you /had/ 10 such chunks under 10% full, but at say usage=50, you still get one freed chunk for every two balance rewrites, taking longer, but still far less time than it would take to rewrite 90% full chunks, with far more dramatic effects... as long as there are chunks to balance and combine at that usage level, of course. Here, we're using a different side effect, the fact that with a raid1 setup, there are always two copies of the chunk, one on each of exactly two devices, and that when new chunks are allocated, they *SHOULD* be allocated from the devices with the most free space, subject only to the rule that both copies cannot be on the same device, so the effect is that it'll allocate from the device with the most space left for the first copy, and then for the second copy, it'll allocate from the device with the most space left, but where the device list excludes the device that the first copy is on. But, the point that Qu is making is that balance, by definition, rewrites both raid1 copies of the chunk. It can't simply rewrite just the one that's on the fullest device to the most empty and leave the other copy alone. So what it will do is allocate space for a new chunk from each of the two devices with the most space left, and will copy the chunks to them, only releasing the existing copies when the copy is done and the new copies are safely on their respective devices. Which means that at least two devices MUST have space left in ordered to rebalance from raid1 to raid1. If only one device has space left, no rebalance can be done. Now your 3 TB and 4 TB devices, one each, are full, with space left only on the 6 TB device. When you first switched from the 2 TB device to the 6 TB device, the device delete would have rewritten from the 2 TB device to the 6 TB device, and you probably had some space left on the other devices at that point. However, you didn't have enough space left on the other two devices to utilize much of the 6 TB device, because each time you allocated a chunk on the 6 TB device, a chunk had to be allocated on one of the others as well, and they simply didn't have enough space left by that point to do that too many times. Now, you /did/ try to rebalance before you /fully/ ran out of space on the other devices, and that's what Chris and I were thinking should have worked, putting one copy of each rebalanced chunk on the 6 TB device. But, lacking (preferably) btrfs device usage (or btrfs filesystem show, gives a bit less information but does say how much of each device is actually used) reports from /before/ the further fillup, we can't say for sure how much space was actually left. Now here's the question. You said you estimated each drive had ~50 GB free when you did the original replace and then tried to balance, but where did that 50 GB number come from? Here's why it matters. Btrfs allocates space in two steps. First it allocates from the unallocated pool into chunks, which can be data or metadata (there's also system chunks, but that's only a few MiB total, in your case 32 MiB on each of two devices given the raid1, and doesn't change dramatically with usage as data and metadata chunks do). And it can easily happen that all available space is already allocated into (partially used) chunks, so there's no actually unallocated space left on a device in ordered to allocate further chunks, but there's still sufficient space left in the partially used chunks to continue adding and changing files for some time. Only when new chunk allocation is necessary will a problem show up. Now given the various btrfs reports, btrfs fi show and btrfs fi df, or btrfs fi usage, or for a device-centric report, btrfs dev usage, possibly combined with the other reports depending on what you're trying to figure out, it's quite possible to tell exactly what the status of each of the devices is, regarding both unallocated space as well as allocated chunks, and how much of those allocated chunks is actually used (globally, unfortunately actual usage of the chunk allocation isn't broken down by device, tho that information isn't technically needed per-device). But if you're estimating only based on normal df, not the btrfs versions of the commands, you don't know how much space remained actually unallocated on each device, and for balance, that's the critical thing, particularly with raid1, since it MUST have space to allocate new chunks on AT LEAST TWO devices. Which is where the IRC recommendation to add a 4th device of some GiB came in, the idea being to add enough unallocated space on that 4th device, that being the second device with actually unallocated space, to get you out of the tight spot. There is, however, another factor in play here as well, chunk size. Data chunks are the largest, and are nominally 1 GiB in size. *HOWEVER*, on devices over some particular size, they can increase in size upto, a dev stated in one thread, 10 GiB in size. However, while I know it can happen at larger filesystem and device sizes, I don't have the foggiest what the conditions and algorithms for chunk size are. But with TB-scale devices and btrfs', it's very possible, even likely, that you're dealing with over the 1 GiB nominal size. And if you're dealing with 10 GiB chunk sizes, or possibly even larger if I took that dev's chunk size limitation comments out of context and am wrong about that chunk size limit... You may well simply not have a second device with enough unallocated space on it to properly handle the chunk sizes on that filesystem. Certainly, the btrfs fi usage report you posted showed a few gigs of unallocated space on each of three of the four devices (with all sorts of space left on the 6 TB device, of course), but all three were in the single-digits GB, and if most of your data chunks are 10 GiB... you simply don't have a device with enough unallocated space left to write that second copy. Tho adding back that 2 TB device and doing a balance should indeed give you enough space to put a serious dent in that imbalance. But as Qu says, you will likely end up having to rebalance several times in ordered to get it nicely balanced out, since you'll fill up that under 2 TiB pretty fast from the other two full devices and it'll start round- robinning to all three for the second copy before the other two are even a TiB down from full. Again as Qu says, rebalancing to single and back to raid1 is another option, that should result in a much faster loading of the 6 TB device. I think (but I'm not sure) that the the single mode allocator still uses the "most space" allocation algorithm, in which case, given a total raid1 usage of 7.77 TiB, which should be 3.88 TiB (~4.25 TB) in single mode, you should end up with a nearly free 3 TB device, just under 1 GiB on the 4 TB device, and just under 3 TB on the 6 TB device, basically 3 TB free/ unallocated on each of the three devices. (The tiny 4th device should be left entirely free in that case and should then be trivial to device delete as there will be nothing on it to move to other devices, it'll be a simple change to the system chunk device data and the superblocks on the other three devices.) Then you can rebalance to raid1 mode again, and it should use up that 3 TB on each device relatively evenly, round-robinning an unused device that alternates on each set of chunks copied. While ~3/4 of all chunks should start out with their single-mode copy on the 6 TB device, 3/4 of all chunks deleted will be off it, leaving it free to get one of the two copies most of the time. You should end up with about 1.3 TB free per device, with about 1.6 TB of the 3 TB device allocated, 2.6 TB of the 4 TB device allocated, together pretty well sharing one copy of each chunk between them, and 4.3 T of the 6 TB device used, pretty much one copy of each chunk on its own. The down side to that is that you're left with only a single copy while in single mode, and if that copy gets corrupted, you simply lose whatever was in that now corrupted chunk. If the data's valuable enough, you may thus prefer to do repeated balances. The other alternative of course is to ensure that everything that's not trivially replaced is backed up, and start from scratch with a newly created btrfs on the three devices, restoring to it from backup. That's what I'd do, since the sysadmin's rule of backups in simple form says if it's not backed up, you are by definition of your (in)action, defining that data as worth less than the time/trouble/resources necessary to back it up. So if it's worth the hassle it should be already backed up so you can simply blow away the existing filesystem, create it over new, and restore from backups, and if you don't have those backups, then by definition it's not worth the hassle, and starting over with a fresh filesystem is all three of (1) less hassle, (2) a chance to take advantage of newer filesystem options that weren't available when you first created the existing filesystem, and (3) a clean start, blowing away any chance of some bug lurking in the existing layout waiting to come back and bite you after you've put all the work into those rebalances, if you choose them over the clean start. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 19:33 ` Chris Murphy 2016-03-24 1:59 ` Qu Wenruo @ 2016-03-25 13:16 ` Patrik Lundquist 2016-03-25 14:35 ` Henk Slager 2016-03-27 4:23 ` Brad Templeton 1 sibling, 2 replies; 35+ messages in thread From: Patrik Lundquist @ 2016-03-25 13:16 UTC (permalink / raw) To: Chris Murphy; +Cc: Brad Templeton, Btrfs BTRFS On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: > > On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> wrote: > > > > I am surprised to hear it said that having the mixed sizes is an odd > > case. > > Not odd as in wrong, just uncommon compared to other arrangements being tested. I think mixed drive sizes in raid1 is a killer feature for a home NAS, where you replace an old smaller drive with the latest and largest when you need more storage. My raid1 currently consists of 6TB+3TB+3*2TB. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-25 13:16 ` Patrik Lundquist @ 2016-03-25 14:35 ` Henk Slager 2016-03-26 4:15 ` Duncan [not found] ` <CAHz9+Emc4DsXoMLKYrp1TfN+2r2cXxaJmPyTnpeCZF=h0FhtMg@mail.gmail.com> 2016-03-27 4:23 ` Brad Templeton 1 sibling, 2 replies; 35+ messages in thread From: Henk Slager @ 2016-03-25 14:35 UTC (permalink / raw) To: Patrik Lundquist; +Cc: Chris Murphy, Brad Templeton, Btrfs BTRFS On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist <patrik.lundquist@gmail.com> wrote: > On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >> >> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> wrote: >> > >> > I am surprised to hear it said that having the mixed sizes is an odd >> > case. >> >> Not odd as in wrong, just uncommon compared to other arrangements being tested. > > I think mixed drive sizes in raid1 is a killer feature for a home NAS, > where you replace an old smaller drive with the latest and largest > when you need more storage. > > My raid1 currently consists of 6TB+3TB+3*2TB. For the original OP situation, with chunks all filled op with extents and devices all filled up with chunks, 'integrating' a new 6TB drive in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual way in order to avoid immediate balancing needs: - 'plug-in' the 6TB - btrfs-replace 4TB by 6TB - btrfs fi resize max 6TB_devID - btrfs-replace 2TB by 4TB - btrfs fi resize max 4TB_devID - 'unplug' the 2TB So then there would be 2 devices with roughly 2TB space available, so good for continued btrfs raid1 writes. An offline variant with dd instead of btrfs-replace could also be done (I used to do that sometimes when btrfs-replace was not implemented). My experience is that btrfs-replace speed is roughly at max speed (so harddisk magnetic media transferspeed) during the whole replace process and it does in a more direct way what you actually want. So in total mostly way faster device replace/upgrade than with the add+delete method. And raid1 redundancy is active all the time. Of course it means first make sure the system runs up-to-date/latest kernel+tools. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-25 14:35 ` Henk Slager @ 2016-03-26 4:15 ` Duncan [not found] ` <CAHz9+Emc4DsXoMLKYrp1TfN+2r2cXxaJmPyTnpeCZF=h0FhtMg@mail.gmail.com> 1 sibling, 0 replies; 35+ messages in thread From: Duncan @ 2016-03-26 4:15 UTC (permalink / raw) To: linux-btrfs Henk Slager posted on Fri, 25 Mar 2016 15:35:52 +0100 as excerpted: > For the original OP situation, with chunks all filled op with extents > and devices all filled up with chunks, 'integrating' a new 6TB drive > in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual > way in order to avoid immediate balancing needs: > - 'plug-in' the 6TB > - btrfs-replace 4TB by 6TB > - btrfs fi resize max 6TB_devID > - btrfs-replace 2TB by 4TB > - btrfs fi resize max 4TB_devID > - 'unplug' the 2TB Way to think outside the box, Henk! I'll have to remember this as it's a very clever and rather useful method-tool to have in the ol' admin toolbox (aka brain). =:^) I only wish I had thought of it, as it sure seems clear... now that you described it! Greatly appreciated, in any case! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <CAHz9+Emc4DsXoMLKYrp1TfN+2r2cXxaJmPyTnpeCZF=h0FhtMg@mail.gmail.com>]
* Re: RAID-1 refuses to balance large drive [not found] ` <CAHz9+Emc4DsXoMLKYrp1TfN+2r2cXxaJmPyTnpeCZF=h0FhtMg@mail.gmail.com> @ 2018-05-27 1:27 ` Brad Templeton 2018-05-27 1:41 ` Qu Wenruo 2018-06-08 3:23 ` Zygo Blaxell 0 siblings, 2 replies; 35+ messages in thread From: Brad Templeton @ 2018-05-27 1:27 UTC (permalink / raw) To: Btrfs BTRFS A few years ago, I encountered an issue (halfway between a bug and a problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly full. The problem was that after replacing (by add/delete) a small drive with a larger one, there were now 2 full drives and one new half-full one, and balance was not able to correct this situation to produce the desired result, which is 3 drives, each with a roughly even amount of free space. It can't do it because the 2 smaller drives are full, and it doesn't realize it could just move one of the copies of a block off the smaller drive onto the larger drive to free space on the smaller drive, it wants to move them both, and there is nowhere to put them both. I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to repeat the very time consuming situation, so I wanted to find out if things were fixed now. I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic (4.15) though that adds a lot more to my plate before a long trip and I would prefer to avoid if I can. So what is the best strategy: a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from 4TB but possibly not enough) c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently vacated 6TB -- much longer procedure but possibly better Or has this all been fixed and method A will work fine and get to the ideal goal -- 3 drives, with available space suitably distributed to allow full utilization over time? On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: > A few years ago, I encountered an issue (halfway between a bug and a > problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly > full. The problem was that after replacing (by add/delete) a small drive > with a larger one, there were now 2 full drives and one new half-full one, > and balance was not able to correct this situation to produce the desired > result, which is 3 drives, each with a roughly even amount of free space. > It can't do it because the 2 smaller drives are full, and it doesn't realize > it could just move one of the copies of a block off the smaller drive onto > the larger drive to free space on the smaller drive, it wants to move them > both, and there is nowhere to put them both. > > I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB > and replacing one of the 4TB with an 8TB. I don't want to repeat the very > time consuming situation, so I wanted to find out if things were fixed now. > I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic > (4.15) though that adds a lot more to my plate before a long trip and I > would prefer to avoid if I can. > > So what is the best strategy: > > a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" > strategy) > b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from > 4TB but possibly not enough) > c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently > vacated 6TB -- much longer procedure but possibly better > > Or has this all been fixed and method A will work fine and get to the ideal > goal -- 3 drives, with available space suitably distributed to allow full > utilization over time? > > On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: >> >> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist >> <patrik.lundquist@gmail.com> wrote: >> > On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >> >> >> >> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> >> >> wrote: >> >> > >> >> > I am surprised to hear it said that having the mixed sizes is an odd >> >> > case. >> >> >> >> Not odd as in wrong, just uncommon compared to other arrangements being >> >> tested. >> > >> > I think mixed drive sizes in raid1 is a killer feature for a home NAS, >> > where you replace an old smaller drive with the latest and largest >> > when you need more storage. >> > >> > My raid1 currently consists of 6TB+3TB+3*2TB. >> >> For the original OP situation, with chunks all filled op with extents >> and devices all filled up with chunks, 'integrating' a new 6TB drive >> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual >> way in order to avoid immediate balancing needs: >> - 'plug-in' the 6TB >> - btrfs-replace 4TB by 6TB >> - btrfs fi resize max 6TB_devID >> - btrfs-replace 2TB by 4TB >> - btrfs fi resize max 4TB_devID >> - 'unplug' the 2TB >> >> So then there would be 2 devices with roughly 2TB space available, so >> good for continued btrfs raid1 writes. >> >> An offline variant with dd instead of btrfs-replace could also be done >> (I used to do that sometimes when btrfs-replace was not implemented). >> My experience is that btrfs-replace speed is roughly at max speed (so >> harddisk magnetic media transferspeed) during the whole replace >> process and it does in a more direct way what you actually want. So in >> total mostly way faster device replace/upgrade than with the >> add+delete method. And raid1 redundancy is active all the time. Of >> course it means first make sure the system runs up-to-date/latest >> kernel+tools. > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 1:27 ` Brad Templeton @ 2018-05-27 1:41 ` Qu Wenruo 2018-05-27 1:49 ` Brad Templeton 2018-06-08 3:23 ` Zygo Blaxell 1 sibling, 1 reply; 35+ messages in thread From: Qu Wenruo @ 2018-05-27 1:41 UTC (permalink / raw) To: Brad Templeton, Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 6916 bytes --] On 2018年05月27日 09:27, Brad Templeton wrote: > A few years ago, I encountered an issue (halfway between a bug and a > problem) with attempting to grow a BTRFS 3 disk Raid 1 which was > fairly full. The problem was that after replacing (by add/delete) a > small drive with a larger one, there were now 2 full drives and one > new half-full one, and balance was not able to correct this situation > to produce the desired result, which is 3 drives, each with a roughly > even amount of free space. It can't do it because the 2 smaller > drives are full, and it doesn't realize it could just move one of the > copies of a block off the smaller drive onto the larger drive to free > space on the smaller drive, it wants to move them both, and there is > nowhere to put them both. It's not that easy. For balance, btrfs must first find a large enough space to locate both copy, then copy data. Or if powerloss happens, it will cause data corruption. So in your case, btrfs can only find enough space for one copy, thus unable to relocate any chunk. > > I'm about to do it again, taking my nearly full array which is 4TB, > 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to > repeat the very time consuming situation, so I wanted to find out if > things were fixed now. I am running Xenial (kernel 4.4.0) and could > consider the upgrade to bionic (4.15) though that adds a lot more to > my plate before a long trip and I would prefer to avoid if I can. Since there is nothing to fix, the behavior will not change at all. > > So what is the best strategy: > > a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) > b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks > from 4TB but possibly not enough) > c) Replace 6TB with 8TB, resize/balance, then replace 4TB with > recently vacated 6TB -- much longer procedure but possibly better > > Or has this all been fixed and method A will work fine and get to the > ideal goal -- 3 drives, with available space suitably distributed to > allow full utilization over time? Btrfs chunk allocator is already trying to utilize all drivers for a long long time. When allocate chunks, btrfs will choose the device with the most free space. However the nature of RAID1 needs btrfs to allocate extents from 2 different devices, which makes your replaced 4/4/6 a little complex. (If your 4/4/6 array is set up and then filled to current stage, btrfs should be able to utilize all the space) Personally speaking, if you're confident enough, just add a new device, and then do balance. If enough chunks get balanced, there should be enough space freed on existing disks. Then remove the newly added device, then btrfs should handle the remaining space well. Thanks, Qu > > On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: >> A few years ago, I encountered an issue (halfway between a bug and a >> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly >> full. The problem was that after replacing (by add/delete) a small drive >> with a larger one, there were now 2 full drives and one new half-full one, >> and balance was not able to correct this situation to produce the desired >> result, which is 3 drives, each with a roughly even amount of free space. >> It can't do it because the 2 smaller drives are full, and it doesn't realize >> it could just move one of the copies of a block off the smaller drive onto >> the larger drive to free space on the smaller drive, it wants to move them >> both, and there is nowhere to put them both. >> >> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB >> and replacing one of the 4TB with an 8TB. I don't want to repeat the very >> time consuming situation, so I wanted to find out if things were fixed now. >> I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic >> (4.15) though that adds a lot more to my plate before a long trip and I >> would prefer to avoid if I can. >> >> So what is the best strategy: >> >> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" >> strategy) >> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from >> 4TB but possibly not enough) >> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently >> vacated 6TB -- much longer procedure but possibly better >> >> Or has this all been fixed and method A will work fine and get to the ideal >> goal -- 3 drives, with available space suitably distributed to allow full >> utilization over time? >> >> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: >>> >>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist >>> <patrik.lundquist@gmail.com> wrote: >>>> On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >>>>> >>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> >>>>> wrote: >>>>>> >>>>>> I am surprised to hear it said that having the mixed sizes is an odd >>>>>> case. >>>>> >>>>> Not odd as in wrong, just uncommon compared to other arrangements being >>>>> tested. >>>> >>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS, >>>> where you replace an old smaller drive with the latest and largest >>>> when you need more storage. >>>> >>>> My raid1 currently consists of 6TB+3TB+3*2TB. >>> >>> For the original OP situation, with chunks all filled op with extents >>> and devices all filled up with chunks, 'integrating' a new 6TB drive >>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual >>> way in order to avoid immediate balancing needs: >>> - 'plug-in' the 6TB >>> - btrfs-replace 4TB by 6TB >>> - btrfs fi resize max 6TB_devID >>> - btrfs-replace 2TB by 4TB >>> - btrfs fi resize max 4TB_devID >>> - 'unplug' the 2TB >>> >>> So then there would be 2 devices with roughly 2TB space available, so >>> good for continued btrfs raid1 writes. >>> >>> An offline variant with dd instead of btrfs-replace could also be done >>> (I used to do that sometimes when btrfs-replace was not implemented). >>> My experience is that btrfs-replace speed is roughly at max speed (so >>> harddisk magnetic media transferspeed) during the whole replace >>> process and it does in a more direct way what you actually want. So in >>> total mostly way faster device replace/upgrade than with the >>> add+delete method. And raid1 redundancy is active all the time. Of >>> course it means first make sure the system runs up-to-date/latest >>> kernel+tools. >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 1:41 ` Qu Wenruo @ 2018-05-27 1:49 ` Brad Templeton 2018-05-27 1:56 ` Qu Wenruo 0 siblings, 1 reply; 35+ messages in thread From: Brad Templeton @ 2018-05-27 1:49 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS That is what did not work last time. I say I think there can be a "fix" because I hope the goal of BTRFS raid is to be superior to traditional RAID. That if one replaces a drive, and asks to balance, it figures out what needs to be done to make that work. I understand that the current balance algorithm may have trouble with that. In this situation, the ideal result would be the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie extents which are currently on both the 4TB and 6TB -- by moving only one copy. It is not strictly a "bug" in that the code is operating as designed, but it is an undesired function. The problem is the approach you describe did not work in the prior upgrade. On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > On 2018年05月27日 09:27, Brad Templeton wrote: >> A few years ago, I encountered an issue (halfway between a bug and a >> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was >> fairly full. The problem was that after replacing (by add/delete) a >> small drive with a larger one, there were now 2 full drives and one >> new half-full one, and balance was not able to correct this situation >> to produce the desired result, which is 3 drives, each with a roughly >> even amount of free space. It can't do it because the 2 smaller >> drives are full, and it doesn't realize it could just move one of the >> copies of a block off the smaller drive onto the larger drive to free >> space on the smaller drive, it wants to move them both, and there is >> nowhere to put them both. > > It's not that easy. > For balance, btrfs must first find a large enough space to locate both > copy, then copy data. > Or if powerloss happens, it will cause data corruption. > > So in your case, btrfs can only find enough space for one copy, thus > unable to relocate any chunk. > >> >> I'm about to do it again, taking my nearly full array which is 4TB, >> 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to >> repeat the very time consuming situation, so I wanted to find out if >> things were fixed now. I am running Xenial (kernel 4.4.0) and could >> consider the upgrade to bionic (4.15) though that adds a lot more to >> my plate before a long trip and I would prefer to avoid if I can. > > Since there is nothing to fix, the behavior will not change at all. > >> >> So what is the best strategy: >> >> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) >> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks >> from 4TB but possibly not enough) >> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with >> recently vacated 6TB -- much longer procedure but possibly better >> >> Or has this all been fixed and method A will work fine and get to the >> ideal goal -- 3 drives, with available space suitably distributed to >> allow full utilization over time? > > Btrfs chunk allocator is already trying to utilize all drivers for a > long long time. > When allocate chunks, btrfs will choose the device with the most free > space. However the nature of RAID1 needs btrfs to allocate extents from > 2 different devices, which makes your replaced 4/4/6 a little complex. > (If your 4/4/6 array is set up and then filled to current stage, btrfs > should be able to utilize all the space) > > > Personally speaking, if you're confident enough, just add a new device, > and then do balance. > If enough chunks get balanced, there should be enough space freed on > existing disks. > Then remove the newly added device, then btrfs should handle the > remaining space well. > > Thanks, > Qu > >> >> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: >>> A few years ago, I encountered an issue (halfway between a bug and a >>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly >>> full. The problem was that after replacing (by add/delete) a small drive >>> with a larger one, there were now 2 full drives and one new half-full one, >>> and balance was not able to correct this situation to produce the desired >>> result, which is 3 drives, each with a roughly even amount of free space. >>> It can't do it because the 2 smaller drives are full, and it doesn't realize >>> it could just move one of the copies of a block off the smaller drive onto >>> the larger drive to free space on the smaller drive, it wants to move them >>> both, and there is nowhere to put them both. >>> >>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB >>> and replacing one of the 4TB with an 8TB. I don't want to repeat the very >>> time consuming situation, so I wanted to find out if things were fixed now. >>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic >>> (4.15) though that adds a lot more to my plate before a long trip and I >>> would prefer to avoid if I can. >>> >>> So what is the best strategy: >>> >>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" >>> strategy) >>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from >>> 4TB but possibly not enough) >>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently >>> vacated 6TB -- much longer procedure but possibly better >>> >>> Or has this all been fixed and method A will work fine and get to the ideal >>> goal -- 3 drives, with available space suitably distributed to allow full >>> utilization over time? >>> >>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: >>>> >>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist >>>> <patrik.lundquist@gmail.com> wrote: >>>>> On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >>>>>> >>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> I am surprised to hear it said that having the mixed sizes is an odd >>>>>>> case. >>>>>> >>>>>> Not odd as in wrong, just uncommon compared to other arrangements being >>>>>> tested. >>>>> >>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS, >>>>> where you replace an old smaller drive with the latest and largest >>>>> when you need more storage. >>>>> >>>>> My raid1 currently consists of 6TB+3TB+3*2TB. >>>> >>>> For the original OP situation, with chunks all filled op with extents >>>> and devices all filled up with chunks, 'integrating' a new 6TB drive >>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual >>>> way in order to avoid immediate balancing needs: >>>> - 'plug-in' the 6TB >>>> - btrfs-replace 4TB by 6TB >>>> - btrfs fi resize max 6TB_devID >>>> - btrfs-replace 2TB by 4TB >>>> - btrfs fi resize max 4TB_devID >>>> - 'unplug' the 2TB >>>> >>>> So then there would be 2 devices with roughly 2TB space available, so >>>> good for continued btrfs raid1 writes. >>>> >>>> An offline variant with dd instead of btrfs-replace could also be done >>>> (I used to do that sometimes when btrfs-replace was not implemented). >>>> My experience is that btrfs-replace speed is roughly at max speed (so >>>> harddisk magnetic media transferspeed) during the whole replace >>>> process and it does in a more direct way what you actually want. So in >>>> total mostly way faster device replace/upgrade than with the >>>> add+delete method. And raid1 redundancy is active all the time. Of >>>> course it means first make sure the system runs up-to-date/latest >>>> kernel+tools. >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 1:49 ` Brad Templeton @ 2018-05-27 1:56 ` Qu Wenruo 2018-05-27 2:06 ` Brad Templeton 0 siblings, 1 reply; 35+ messages in thread From: Qu Wenruo @ 2018-05-27 1:56 UTC (permalink / raw) To: Brad Templeton; +Cc: Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 8677 bytes --] On 2018年05月27日 09:49, Brad Templeton wrote: > That is what did not work last time. > > I say I think there can be a "fix" because I hope the goal of BTRFS > raid is to be superior to traditional RAID. That if one replaces a > drive, and asks to balance, it figures out what needs to be done to > make that work. I understand that the current balance algorithm may > have trouble with that. In this situation, the ideal result would be > the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB > free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie > extents which are currently on both the 4TB and 6TB -- by moving only > one copy. Btrfs can only do balance in a chunk unit. Thus btrfs can only do: 1) Create new chunk 2) Copy data 3) Remove old chunk. So it can't do the way you mentioned. But your purpose sounds pretty valid and maybe we could enhanace btrfs to do such thing. (Currently only replace can behave like that) > It is not strictly a "bug" in that the code is operating > as designed, but it is an undesired function. > > The problem is the approach you describe did not work in the prior upgrade. Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance? And before/after balance, "btrfs fi usage" and "btrfs fi show" output could also help. Thanks, Qu > > On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> On 2018年05月27日 09:27, Brad Templeton wrote: >>> A few years ago, I encountered an issue (halfway between a bug and a >>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was >>> fairly full. The problem was that after replacing (by add/delete) a >>> small drive with a larger one, there were now 2 full drives and one >>> new half-full one, and balance was not able to correct this situation >>> to produce the desired result, which is 3 drives, each with a roughly >>> even amount of free space. It can't do it because the 2 smaller >>> drives are full, and it doesn't realize it could just move one of the >>> copies of a block off the smaller drive onto the larger drive to free >>> space on the smaller drive, it wants to move them both, and there is >>> nowhere to put them both. >> >> It's not that easy. >> For balance, btrfs must first find a large enough space to locate both >> copy, then copy data. >> Or if powerloss happens, it will cause data corruption. >> >> So in your case, btrfs can only find enough space for one copy, thus >> unable to relocate any chunk. >> >>> >>> I'm about to do it again, taking my nearly full array which is 4TB, >>> 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to >>> repeat the very time consuming situation, so I wanted to find out if >>> things were fixed now. I am running Xenial (kernel 4.4.0) and could >>> consider the upgrade to bionic (4.15) though that adds a lot more to >>> my plate before a long trip and I would prefer to avoid if I can. >> >> Since there is nothing to fix, the behavior will not change at all. >> >>> >>> So what is the best strategy: >>> >>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) >>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks >>> from 4TB but possibly not enough) >>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with >>> recently vacated 6TB -- much longer procedure but possibly better >>> >>> Or has this all been fixed and method A will work fine and get to the >>> ideal goal -- 3 drives, with available space suitably distributed to >>> allow full utilization over time? >> >> Btrfs chunk allocator is already trying to utilize all drivers for a >> long long time. >> When allocate chunks, btrfs will choose the device with the most free >> space. However the nature of RAID1 needs btrfs to allocate extents from >> 2 different devices, which makes your replaced 4/4/6 a little complex. >> (If your 4/4/6 array is set up and then filled to current stage, btrfs >> should be able to utilize all the space) >> >> >> Personally speaking, if you're confident enough, just add a new device, >> and then do balance. >> If enough chunks get balanced, there should be enough space freed on >> existing disks. >> Then remove the newly added device, then btrfs should handle the >> remaining space well. >> >> Thanks, >> Qu >> >>> >>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: >>>> A few years ago, I encountered an issue (halfway between a bug and a >>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly >>>> full. The problem was that after replacing (by add/delete) a small drive >>>> with a larger one, there were now 2 full drives and one new half-full one, >>>> and balance was not able to correct this situation to produce the desired >>>> result, which is 3 drives, each with a roughly even amount of free space. >>>> It can't do it because the 2 smaller drives are full, and it doesn't realize >>>> it could just move one of the copies of a block off the smaller drive onto >>>> the larger drive to free space on the smaller drive, it wants to move them >>>> both, and there is nowhere to put them both. >>>> >>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB >>>> and replacing one of the 4TB with an 8TB. I don't want to repeat the very >>>> time consuming situation, so I wanted to find out if things were fixed now. >>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic >>>> (4.15) though that adds a lot more to my plate before a long trip and I >>>> would prefer to avoid if I can. >>>> >>>> So what is the best strategy: >>>> >>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" >>>> strategy) >>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from >>>> 4TB but possibly not enough) >>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently >>>> vacated 6TB -- much longer procedure but possibly better >>>> >>>> Or has this all been fixed and method A will work fine and get to the ideal >>>> goal -- 3 drives, with available space suitably distributed to allow full >>>> utilization over time? >>>> >>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: >>>>> >>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist >>>>> <patrik.lundquist@gmail.com> wrote: >>>>>> On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >>>>>>> >>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd >>>>>>>> case. >>>>>>> >>>>>>> Not odd as in wrong, just uncommon compared to other arrangements being >>>>>>> tested. >>>>>> >>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS, >>>>>> where you replace an old smaller drive with the latest and largest >>>>>> when you need more storage. >>>>>> >>>>>> My raid1 currently consists of 6TB+3TB+3*2TB. >>>>> >>>>> For the original OP situation, with chunks all filled op with extents >>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive >>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual >>>>> way in order to avoid immediate balancing needs: >>>>> - 'plug-in' the 6TB >>>>> - btrfs-replace 4TB by 6TB >>>>> - btrfs fi resize max 6TB_devID >>>>> - btrfs-replace 2TB by 4TB >>>>> - btrfs fi resize max 4TB_devID >>>>> - 'unplug' the 2TB >>>>> >>>>> So then there would be 2 devices with roughly 2TB space available, so >>>>> good for continued btrfs raid1 writes. >>>>> >>>>> An offline variant with dd instead of btrfs-replace could also be done >>>>> (I used to do that sometimes when btrfs-replace was not implemented). >>>>> My experience is that btrfs-replace speed is roughly at max speed (so >>>>> harddisk magnetic media transferspeed) during the whole replace >>>>> process and it does in a more direct way what you actually want. So in >>>>> total mostly way faster device replace/upgrade than with the >>>>> add+delete method. And raid1 redundancy is active all the time. Of >>>>> course it means first make sure the system runs up-to-date/latest >>>>> kernel+tools. >>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 1:56 ` Qu Wenruo @ 2018-05-27 2:06 ` Brad Templeton 2018-05-27 2:16 ` Qu Wenruo 0 siblings, 1 reply; 35+ messages in thread From: Brad Templeton @ 2018-05-27 2:06 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS Thanks. These are all things which take substantial fractions of a day to try, unfortunately. Last time I ended up fixing it in a fairly kluged way, which was to convert from raid-1 to single long enough to get enough single blocks that when I converted back to raid-1 they got distributed to the right drives. But this is, aside from being a kludge, a procedure with some minor risk. Of course I am taking a backup first, but still... This strikes me as something that should be a fairly common event -- your raid is filling up, and so you expand it by replacing the oldest and smallest drive with a new much bigger one. In the old days of RAID, you could not do that, you had to grow all drives at the same time, and this is one of the ways that BTRFS is quite superior. When I had MD raid, I went through a strange process of always having a raid 5 that consisted of different sized drives. The raid-5 was based on the smallest of the 3 drives, and then the larger ones had extra space which could either be in raid-1, or more imply was in solo disk mode and used for less critical data (such as backups and old archives.) Slowly, and in a messy way, each time I replaced the smallest drive, I could then grow the raid 5. Yuck. BTRFS is so much better, except for this issue. So if somebody has a thought of a procedure that is fairly sure to work and doesn't involve too many copying passes -- copying 4tb is not a quick operation -- it is much appreciated and might be a good thing to add to a wiki page, which I would be happy to do. On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > On 2018年05月27日 09:49, Brad Templeton wrote: >> That is what did not work last time. >> >> I say I think there can be a "fix" because I hope the goal of BTRFS >> raid is to be superior to traditional RAID. That if one replaces a >> drive, and asks to balance, it figures out what needs to be done to >> make that work. I understand that the current balance algorithm may >> have trouble with that. In this situation, the ideal result would be >> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB >> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie >> extents which are currently on both the 4TB and 6TB -- by moving only >> one copy. > > Btrfs can only do balance in a chunk unit. > Thus btrfs can only do: > 1) Create new chunk > 2) Copy data > 3) Remove old chunk. > > So it can't do the way you mentioned. > But your purpose sounds pretty valid and maybe we could enhanace btrfs > to do such thing. > (Currently only replace can behave like that) > >> It is not strictly a "bug" in that the code is operating >> as designed, but it is an undesired function. >> >> The problem is the approach you describe did not work in the prior upgrade. > > Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance? > And before/after balance, "btrfs fi usage" and "btrfs fi show" output > could also help. > > Thanks, > Qu > >> >> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>> >>> >>> On 2018年05月27日 09:27, Brad Templeton wrote: >>>> A few years ago, I encountered an issue (halfway between a bug and a >>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was >>>> fairly full. The problem was that after replacing (by add/delete) a >>>> small drive with a larger one, there were now 2 full drives and one >>>> new half-full one, and balance was not able to correct this situation >>>> to produce the desired result, which is 3 drives, each with a roughly >>>> even amount of free space. It can't do it because the 2 smaller >>>> drives are full, and it doesn't realize it could just move one of the >>>> copies of a block off the smaller drive onto the larger drive to free >>>> space on the smaller drive, it wants to move them both, and there is >>>> nowhere to put them both. >>> >>> It's not that easy. >>> For balance, btrfs must first find a large enough space to locate both >>> copy, then copy data. >>> Or if powerloss happens, it will cause data corruption. >>> >>> So in your case, btrfs can only find enough space for one copy, thus >>> unable to relocate any chunk. >>> >>>> >>>> I'm about to do it again, taking my nearly full array which is 4TB, >>>> 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to >>>> repeat the very time consuming situation, so I wanted to find out if >>>> things were fixed now. I am running Xenial (kernel 4.4.0) and could >>>> consider the upgrade to bionic (4.15) though that adds a lot more to >>>> my plate before a long trip and I would prefer to avoid if I can. >>> >>> Since there is nothing to fix, the behavior will not change at all. >>> >>>> >>>> So what is the best strategy: >>>> >>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) >>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks >>>> from 4TB but possibly not enough) >>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with >>>> recently vacated 6TB -- much longer procedure but possibly better >>>> >>>> Or has this all been fixed and method A will work fine and get to the >>>> ideal goal -- 3 drives, with available space suitably distributed to >>>> allow full utilization over time? >>> >>> Btrfs chunk allocator is already trying to utilize all drivers for a >>> long long time. >>> When allocate chunks, btrfs will choose the device with the most free >>> space. However the nature of RAID1 needs btrfs to allocate extents from >>> 2 different devices, which makes your replaced 4/4/6 a little complex. >>> (If your 4/4/6 array is set up and then filled to current stage, btrfs >>> should be able to utilize all the space) >>> >>> >>> Personally speaking, if you're confident enough, just add a new device, >>> and then do balance. >>> If enough chunks get balanced, there should be enough space freed on >>> existing disks. >>> Then remove the newly added device, then btrfs should handle the >>> remaining space well. >>> >>> Thanks, >>> Qu >>> >>>> >>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: >>>>> A few years ago, I encountered an issue (halfway between a bug and a >>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly >>>>> full. The problem was that after replacing (by add/delete) a small drive >>>>> with a larger one, there were now 2 full drives and one new half-full one, >>>>> and balance was not able to correct this situation to produce the desired >>>>> result, which is 3 drives, each with a roughly even amount of free space. >>>>> It can't do it because the 2 smaller drives are full, and it doesn't realize >>>>> it could just move one of the copies of a block off the smaller drive onto >>>>> the larger drive to free space on the smaller drive, it wants to move them >>>>> both, and there is nowhere to put them both. >>>>> >>>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB >>>>> and replacing one of the 4TB with an 8TB. I don't want to repeat the very >>>>> time consuming situation, so I wanted to find out if things were fixed now. >>>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic >>>>> (4.15) though that adds a lot more to my plate before a long trip and I >>>>> would prefer to avoid if I can. >>>>> >>>>> So what is the best strategy: >>>>> >>>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" >>>>> strategy) >>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from >>>>> 4TB but possibly not enough) >>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently >>>>> vacated 6TB -- much longer procedure but possibly better >>>>> >>>>> Or has this all been fixed and method A will work fine and get to the ideal >>>>> goal -- 3 drives, with available space suitably distributed to allow full >>>>> utilization over time? >>>>> >>>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: >>>>>> >>>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist >>>>>> <patrik.lundquist@gmail.com> wrote: >>>>>>> On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >>>>>>>> >>>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd >>>>>>>>> case. >>>>>>>> >>>>>>>> Not odd as in wrong, just uncommon compared to other arrangements being >>>>>>>> tested. >>>>>>> >>>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS, >>>>>>> where you replace an old smaller drive with the latest and largest >>>>>>> when you need more storage. >>>>>>> >>>>>>> My raid1 currently consists of 6TB+3TB+3*2TB. >>>>>> >>>>>> For the original OP situation, with chunks all filled op with extents >>>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive >>>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual >>>>>> way in order to avoid immediate balancing needs: >>>>>> - 'plug-in' the 6TB >>>>>> - btrfs-replace 4TB by 6TB >>>>>> - btrfs fi resize max 6TB_devID >>>>>> - btrfs-replace 2TB by 4TB >>>>>> - btrfs fi resize max 4TB_devID >>>>>> - 'unplug' the 2TB >>>>>> >>>>>> So then there would be 2 devices with roughly 2TB space available, so >>>>>> good for continued btrfs raid1 writes. >>>>>> >>>>>> An offline variant with dd instead of btrfs-replace could also be done >>>>>> (I used to do that sometimes when btrfs-replace was not implemented). >>>>>> My experience is that btrfs-replace speed is roughly at max speed (so >>>>>> harddisk magnetic media transferspeed) during the whole replace >>>>>> process and it does in a more direct way what you actually want. So in >>>>>> total mostly way faster device replace/upgrade than with the >>>>>> add+delete method. And raid1 redundancy is active all the time. Of >>>>>> course it means first make sure the system runs up-to-date/latest >>>>>> kernel+tools. >>>>> >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 2:06 ` Brad Templeton @ 2018-05-27 2:16 ` Qu Wenruo 2018-05-27 2:21 ` Brad Templeton 0 siblings, 1 reply; 35+ messages in thread From: Qu Wenruo @ 2018-05-27 2:16 UTC (permalink / raw) To: Brad Templeton; +Cc: Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 11573 bytes --] On 2018年05月27日 10:06, Brad Templeton wrote: > Thanks. These are all things which take substantial fractions of a > day to try, unfortunately. Normally I would suggest just using VM and several small disks (~10G), along with fallocate (the fastest way to use space) to get a basic view of the procedure. > Last time I ended up fixing it in a > fairly kluged way, which was to convert from raid-1 to single long > enough to get enough single blocks that when I converted back to > raid-1 they got distributed to the right drives. Yep, that's the ultimate one-fit-all solution. Also, this reminds me about the fact we could do the RAID1->Single/DUP->Single downgrade in a much much faster way. I think it's worthy considering for later enhancement. > But this is, aside > from being a kludge, a procedure with some minor risk. Of course I am > taking a backup first, but still... > > This strikes me as something that should be a fairly common event -- > your raid is filling up, and so you expand it by replacing the oldest > and smallest drive with a new much bigger one. In the old days of > RAID, you could not do that, you had to grow all drives at the same > time, and this is one of the ways that BTRFS is quite superior. > When I had MD raid, I went through a strange process of always having > a raid 5 that consisted of different sized drives. The raid-5 was > based on the smallest of the 3 drives, and then the larger ones had > extra space which could either be in raid-1, or more imply was in solo > disk mode and used for less critical data (such as backups and old > archives.) Slowly, and in a messy way, each time I replaced the > smallest drive, I could then grow the raid 5. Yuck. BTRFS is so > much better, except for this issue. > > So if somebody has a thought of a procedure that is fairly sure to > work and doesn't involve too many copying passes -- copying 4tb is not > a quick operation -- it is much appreciated and might be a good thing > to add to a wiki page, which I would be happy to do. Anyway, "btrfs fi show" and "btrfs fi usage" would help before any further advice from community. Thanks, Qu > > On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> On 2018年05月27日 09:49, Brad Templeton wrote: >>> That is what did not work last time. >>> >>> I say I think there can be a "fix" because I hope the goal of BTRFS >>> raid is to be superior to traditional RAID. That if one replaces a >>> drive, and asks to balance, it figures out what needs to be done to >>> make that work. I understand that the current balance algorithm may >>> have trouble with that. In this situation, the ideal result would be >>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB >>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie >>> extents which are currently on both the 4TB and 6TB -- by moving only >>> one copy. >> >> Btrfs can only do balance in a chunk unit. >> Thus btrfs can only do: >> 1) Create new chunk >> 2) Copy data >> 3) Remove old chunk. >> >> So it can't do the way you mentioned. >> But your purpose sounds pretty valid and maybe we could enhanace btrfs >> to do such thing. >> (Currently only replace can behave like that) >> >>> It is not strictly a "bug" in that the code is operating >>> as designed, but it is an undesired function. >>> >>> The problem is the approach you describe did not work in the prior upgrade. >> >> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance? >> And before/after balance, "btrfs fi usage" and "btrfs fi show" output >> could also help. >> >> Thanks, >> Qu >> >>> >>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>> >>>> >>>> On 2018年05月27日 09:27, Brad Templeton wrote: >>>>> A few years ago, I encountered an issue (halfway between a bug and a >>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was >>>>> fairly full. The problem was that after replacing (by add/delete) a >>>>> small drive with a larger one, there were now 2 full drives and one >>>>> new half-full one, and balance was not able to correct this situation >>>>> to produce the desired result, which is 3 drives, each with a roughly >>>>> even amount of free space. It can't do it because the 2 smaller >>>>> drives are full, and it doesn't realize it could just move one of the >>>>> copies of a block off the smaller drive onto the larger drive to free >>>>> space on the smaller drive, it wants to move them both, and there is >>>>> nowhere to put them both. >>>> >>>> It's not that easy. >>>> For balance, btrfs must first find a large enough space to locate both >>>> copy, then copy data. >>>> Or if powerloss happens, it will cause data corruption. >>>> >>>> So in your case, btrfs can only find enough space for one copy, thus >>>> unable to relocate any chunk. >>>> >>>>> >>>>> I'm about to do it again, taking my nearly full array which is 4TB, >>>>> 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to >>>>> repeat the very time consuming situation, so I wanted to find out if >>>>> things were fixed now. I am running Xenial (kernel 4.4.0) and could >>>>> consider the upgrade to bionic (4.15) though that adds a lot more to >>>>> my plate before a long trip and I would prefer to avoid if I can. >>>> >>>> Since there is nothing to fix, the behavior will not change at all. >>>> >>>>> >>>>> So what is the best strategy: >>>>> >>>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) >>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks >>>>> from 4TB but possibly not enough) >>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with >>>>> recently vacated 6TB -- much longer procedure but possibly better >>>>> >>>>> Or has this all been fixed and method A will work fine and get to the >>>>> ideal goal -- 3 drives, with available space suitably distributed to >>>>> allow full utilization over time? >>>> >>>> Btrfs chunk allocator is already trying to utilize all drivers for a >>>> long long time. >>>> When allocate chunks, btrfs will choose the device with the most free >>>> space. However the nature of RAID1 needs btrfs to allocate extents from >>>> 2 different devices, which makes your replaced 4/4/6 a little complex. >>>> (If your 4/4/6 array is set up and then filled to current stage, btrfs >>>> should be able to utilize all the space) >>>> >>>> >>>> Personally speaking, if you're confident enough, just add a new device, >>>> and then do balance. >>>> If enough chunks get balanced, there should be enough space freed on >>>> existing disks. >>>> Then remove the newly added device, then btrfs should handle the >>>> remaining space well. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: >>>>>> A few years ago, I encountered an issue (halfway between a bug and a >>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly >>>>>> full. The problem was that after replacing (by add/delete) a small drive >>>>>> with a larger one, there were now 2 full drives and one new half-full one, >>>>>> and balance was not able to correct this situation to produce the desired >>>>>> result, which is 3 drives, each with a roughly even amount of free space. >>>>>> It can't do it because the 2 smaller drives are full, and it doesn't realize >>>>>> it could just move one of the copies of a block off the smaller drive onto >>>>>> the larger drive to free space on the smaller drive, it wants to move them >>>>>> both, and there is nowhere to put them both. >>>>>> >>>>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB >>>>>> and replacing one of the 4TB with an 8TB. I don't want to repeat the very >>>>>> time consuming situation, so I wanted to find out if things were fixed now. >>>>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic >>>>>> (4.15) though that adds a lot more to my plate before a long trip and I >>>>>> would prefer to avoid if I can. >>>>>> >>>>>> So what is the best strategy: >>>>>> >>>>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" >>>>>> strategy) >>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from >>>>>> 4TB but possibly not enough) >>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently >>>>>> vacated 6TB -- much longer procedure but possibly better >>>>>> >>>>>> Or has this all been fixed and method A will work fine and get to the ideal >>>>>> goal -- 3 drives, with available space suitably distributed to allow full >>>>>> utilization over time? >>>>>> >>>>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: >>>>>>> >>>>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist >>>>>>> <patrik.lundquist@gmail.com> wrote: >>>>>>>> On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >>>>>>>>> >>>>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd >>>>>>>>>> case. >>>>>>>>> >>>>>>>>> Not odd as in wrong, just uncommon compared to other arrangements being >>>>>>>>> tested. >>>>>>>> >>>>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS, >>>>>>>> where you replace an old smaller drive with the latest and largest >>>>>>>> when you need more storage. >>>>>>>> >>>>>>>> My raid1 currently consists of 6TB+3TB+3*2TB. >>>>>>> >>>>>>> For the original OP situation, with chunks all filled op with extents >>>>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive >>>>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual >>>>>>> way in order to avoid immediate balancing needs: >>>>>>> - 'plug-in' the 6TB >>>>>>> - btrfs-replace 4TB by 6TB >>>>>>> - btrfs fi resize max 6TB_devID >>>>>>> - btrfs-replace 2TB by 4TB >>>>>>> - btrfs fi resize max 4TB_devID >>>>>>> - 'unplug' the 2TB >>>>>>> >>>>>>> So then there would be 2 devices with roughly 2TB space available, so >>>>>>> good for continued btrfs raid1 writes. >>>>>>> >>>>>>> An offline variant with dd instead of btrfs-replace could also be done >>>>>>> (I used to do that sometimes when btrfs-replace was not implemented). >>>>>>> My experience is that btrfs-replace speed is roughly at max speed (so >>>>>>> harddisk magnetic media transferspeed) during the whole replace >>>>>>> process and it does in a more direct way what you actually want. So in >>>>>>> total mostly way faster device replace/upgrade than with the >>>>>>> add+delete method. And raid1 redundancy is active all the time. Of >>>>>>> course it means first make sure the system runs up-to-date/latest >>>>>>> kernel+tools. >>>>>> >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 2:16 ` Qu Wenruo @ 2018-05-27 2:21 ` Brad Templeton 2018-05-27 5:55 ` Duncan 2018-05-27 18:22 ` Brad Templeton 0 siblings, 2 replies; 35+ messages in thread From: Brad Templeton @ 2018-05-27 2:21 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS Certainly. My apologies for not including them before. As described, the disks are reasonably balanced -- not as full as the last time. As such, it might be enough that balance would (slowly) free up enough chunks to get things going. And if I have to, I will partially convert to single again. Certainly btrfs replace seems like the most planned and simple path but it will result in a strange distribution of the chunks. Label: 'butter' uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438 Total devices 3 FS bytes used 6.11TiB devid 1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall: Device size: 12.70TiB Device allocated: 12.25TiB Device unallocated: 459.95GiB Device missing: 0.00B Used: 12.21TiB Free (estimated): 246.35GiB (min: 246.35GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 1.32MiB) Data,RAID1: Size:6.11TiB, Used:6.09TiB /dev/sda 3.48TiB /dev/sdi2 5.28TiB /dev/sdj2 3.46TiB Metadata,RAID1: Size:14.00GiB, Used:12.38GiB /dev/sda 8.00GiB /dev/sdi2 7.00GiB /dev/sdj2 13.00GiB System,RAID1: Size:32.00MiB, Used:888.00KiB /dev/sdi2 32.00MiB /dev/sdj2 32.00MiB Unallocated: /dev/sda 153.02GiB /dev/sdi2 154.56GiB /dev/sdj2 152.36GiB devid 2 size 3.64TiB used 3.49TiB path /dev/sda devid 3 size 5.43TiB used 5.28TiB path /dev/sdi2 On Sat, May 26, 2018 at 7:16 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > On 2018年05月27日 10:06, Brad Templeton wrote: >> Thanks. These are all things which take substantial fractions of a >> day to try, unfortunately. > > Normally I would suggest just using VM and several small disks (~10G), > along with fallocate (the fastest way to use space) to get a basic view > of the procedure. > >> Last time I ended up fixing it in a >> fairly kluged way, which was to convert from raid-1 to single long >> enough to get enough single blocks that when I converted back to >> raid-1 they got distributed to the right drives. > > Yep, that's the ultimate one-fit-all solution. > Also, this reminds me about the fact we could do the > RAID1->Single/DUP->Single downgrade in a much much faster way. > I think it's worthy considering for later enhancement. > >> But this is, aside >> from being a kludge, a procedure with some minor risk. Of course I am >> taking a backup first, but still... >> >> This strikes me as something that should be a fairly common event -- >> your raid is filling up, and so you expand it by replacing the oldest >> and smallest drive with a new much bigger one. In the old days of >> RAID, you could not do that, you had to grow all drives at the same >> time, and this is one of the ways that BTRFS is quite superior. >> When I had MD raid, I went through a strange process of always having >> a raid 5 that consisted of different sized drives. The raid-5 was >> based on the smallest of the 3 drives, and then the larger ones had >> extra space which could either be in raid-1, or more imply was in solo >> disk mode and used for less critical data (such as backups and old >> archives.) Slowly, and in a messy way, each time I replaced the >> smallest drive, I could then grow the raid 5. Yuck. BTRFS is so >> much better, except for this issue. >> >> So if somebody has a thought of a procedure that is fairly sure to >> work and doesn't involve too many copying passes -- copying 4tb is not >> a quick operation -- it is much appreciated and might be a good thing >> to add to a wiki page, which I would be happy to do. > > Anyway, "btrfs fi show" and "btrfs fi usage" would help before any > further advice from community. > > Thanks, > Qu > >> >> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>> >>> >>> On 2018年05月27日 09:49, Brad Templeton wrote: >>>> That is what did not work last time. >>>> >>>> I say I think there can be a "fix" because I hope the goal of BTRFS >>>> raid is to be superior to traditional RAID. That if one replaces a >>>> drive, and asks to balance, it figures out what needs to be done to >>>> make that work. I understand that the current balance algorithm may >>>> have trouble with that. In this situation, the ideal result would be >>>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB >>>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie >>>> extents which are currently on both the 4TB and 6TB -- by moving only >>>> one copy. >>> >>> Btrfs can only do balance in a chunk unit. >>> Thus btrfs can only do: >>> 1) Create new chunk >>> 2) Copy data >>> 3) Remove old chunk. >>> >>> So it can't do the way you mentioned. >>> But your purpose sounds pretty valid and maybe we could enhanace btrfs >>> to do such thing. >>> (Currently only replace can behave like that) >>> >>>> It is not strictly a "bug" in that the code is operating >>>> as designed, but it is an undesired function. >>>> >>>> The problem is the approach you describe did not work in the prior upgrade. >>> >>> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance? >>> And before/after balance, "btrfs fi usage" and "btrfs fi show" output >>> could also help. >>> >>> Thanks, >>> Qu >>> >>>> >>>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>> >>>>> >>>>> On 2018年05月27日 09:27, Brad Templeton wrote: >>>>>> A few years ago, I encountered an issue (halfway between a bug and a >>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was >>>>>> fairly full. The problem was that after replacing (by add/delete) a >>>>>> small drive with a larger one, there were now 2 full drives and one >>>>>> new half-full one, and balance was not able to correct this situation >>>>>> to produce the desired result, which is 3 drives, each with a roughly >>>>>> even amount of free space. It can't do it because the 2 smaller >>>>>> drives are full, and it doesn't realize it could just move one of the >>>>>> copies of a block off the smaller drive onto the larger drive to free >>>>>> space on the smaller drive, it wants to move them both, and there is >>>>>> nowhere to put them both. >>>>> >>>>> It's not that easy. >>>>> For balance, btrfs must first find a large enough space to locate both >>>>> copy, then copy data. >>>>> Or if powerloss happens, it will cause data corruption. >>>>> >>>>> So in your case, btrfs can only find enough space for one copy, thus >>>>> unable to relocate any chunk. >>>>> >>>>>> >>>>>> I'm about to do it again, taking my nearly full array which is 4TB, >>>>>> 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to >>>>>> repeat the very time consuming situation, so I wanted to find out if >>>>>> things were fixed now. I am running Xenial (kernel 4.4.0) and could >>>>>> consider the upgrade to bionic (4.15) though that adds a lot more to >>>>>> my plate before a long trip and I would prefer to avoid if I can. >>>>> >>>>> Since there is nothing to fix, the behavior will not change at all. >>>>> >>>>>> >>>>>> So what is the best strategy: >>>>>> >>>>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) >>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks >>>>>> from 4TB but possibly not enough) >>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with >>>>>> recently vacated 6TB -- much longer procedure but possibly better >>>>>> >>>>>> Or has this all been fixed and method A will work fine and get to the >>>>>> ideal goal -- 3 drives, with available space suitably distributed to >>>>>> allow full utilization over time? >>>>> >>>>> Btrfs chunk allocator is already trying to utilize all drivers for a >>>>> long long time. >>>>> When allocate chunks, btrfs will choose the device with the most free >>>>> space. However the nature of RAID1 needs btrfs to allocate extents from >>>>> 2 different devices, which makes your replaced 4/4/6 a little complex. >>>>> (If your 4/4/6 array is set up and then filled to current stage, btrfs >>>>> should be able to utilize all the space) >>>>> >>>>> >>>>> Personally speaking, if you're confident enough, just add a new device, >>>>> and then do balance. >>>>> If enough chunks get balanced, there should be enough space freed on >>>>> existing disks. >>>>> Then remove the newly added device, then btrfs should handle the >>>>> remaining space well. >>>>> >>>>> Thanks, >>>>> Qu >>>>> >>>>>> >>>>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: >>>>>>> A few years ago, I encountered an issue (halfway between a bug and a >>>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly >>>>>>> full. The problem was that after replacing (by add/delete) a small drive >>>>>>> with a larger one, there were now 2 full drives and one new half-full one, >>>>>>> and balance was not able to correct this situation to produce the desired >>>>>>> result, which is 3 drives, each with a roughly even amount of free space. >>>>>>> It can't do it because the 2 smaller drives are full, and it doesn't realize >>>>>>> it could just move one of the copies of a block off the smaller drive onto >>>>>>> the larger drive to free space on the smaller drive, it wants to move them >>>>>>> both, and there is nowhere to put them both. >>>>>>> >>>>>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB >>>>>>> and replacing one of the 4TB with an 8TB. I don't want to repeat the very >>>>>>> time consuming situation, so I wanted to find out if things were fixed now. >>>>>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic >>>>>>> (4.15) though that adds a lot more to my plate before a long trip and I >>>>>>> would prefer to avoid if I can. >>>>>>> >>>>>>> So what is the best strategy: >>>>>>> >>>>>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" >>>>>>> strategy) >>>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from >>>>>>> 4TB but possibly not enough) >>>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently >>>>>>> vacated 6TB -- much longer procedure but possibly better >>>>>>> >>>>>>> Or has this all been fixed and method A will work fine and get to the ideal >>>>>>> goal -- 3 drives, with available space suitably distributed to allow full >>>>>>> utilization over time? >>>>>>> >>>>>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: >>>>>>>> >>>>>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist >>>>>>>> <patrik.lundquist@gmail.com> wrote: >>>>>>>>> On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >>>>>>>>>> >>>>>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd >>>>>>>>>>> case. >>>>>>>>>> >>>>>>>>>> Not odd as in wrong, just uncommon compared to other arrangements being >>>>>>>>>> tested. >>>>>>>>> >>>>>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS, >>>>>>>>> where you replace an old smaller drive with the latest and largest >>>>>>>>> when you need more storage. >>>>>>>>> >>>>>>>>> My raid1 currently consists of 6TB+3TB+3*2TB. >>>>>>>> >>>>>>>> For the original OP situation, with chunks all filled op with extents >>>>>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive >>>>>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual >>>>>>>> way in order to avoid immediate balancing needs: >>>>>>>> - 'plug-in' the 6TB >>>>>>>> - btrfs-replace 4TB by 6TB >>>>>>>> - btrfs fi resize max 6TB_devID >>>>>>>> - btrfs-replace 2TB by 4TB >>>>>>>> - btrfs fi resize max 4TB_devID >>>>>>>> - 'unplug' the 2TB >>>>>>>> >>>>>>>> So then there would be 2 devices with roughly 2TB space available, so >>>>>>>> good for continued btrfs raid1 writes. >>>>>>>> >>>>>>>> An offline variant with dd instead of btrfs-replace could also be done >>>>>>>> (I used to do that sometimes when btrfs-replace was not implemented). >>>>>>>> My experience is that btrfs-replace speed is roughly at max speed (so >>>>>>>> harddisk magnetic media transferspeed) during the whole replace >>>>>>>> process and it does in a more direct way what you actually want. So in >>>>>>>> total mostly way faster device replace/upgrade than with the >>>>>>>> add+delete method. And raid1 redundancy is active all the time. Of >>>>>>>> course it means first make sure the system runs up-to-date/latest >>>>>>>> kernel+tools. >>>>>>> >>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 2:21 ` Brad Templeton @ 2018-05-27 5:55 ` Duncan 2018-05-27 18:22 ` Brad Templeton 1 sibling, 0 replies; 35+ messages in thread From: Duncan @ 2018-05-27 5:55 UTC (permalink / raw) To: linux-btrfs Brad Templeton posted on Sat, 26 May 2018 19:21:57 -0700 as excerpted: > Certainly. My apologies for not including them before. Aieee! Reply before quote, making the reply out of context, and my attempt to reply in context... difficult and troublesome. Please use standard list context-quote, reply in context, next time, making it easier for further replies also in context. > As > described, the disks are reasonably balanced -- not as full as the > last time. As such, it might be enough that balance would (slowly) > free up enough chunks to get things going. And if I have to, I will > partially convert to single again. Certainly btrfs replace seems > like the most planned and simple path but it will result in a strange > distribution of the chunks. [btrfs filesystem usage output below] > Label: 'butter' uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438 > Total devices 3 FS bytes used 6.11TiB > devid 1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall: > Device size: 12.70TiB > Device allocated: 12.25TiB > Device unallocated: 459.95GiB > Device missing: 0.00B > Used: 12.21TiB > Free (estimated): 246.35GiB (min: 246.35GiB) > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 1.32MiB) > > Data,RAID1: Size:6.11TiB, Used:6.09TiB > /dev/sda 3.48TiB > /dev/sdi2 5.28TiB > /dev/sdj2 3.46TiB > > Metadata,RAID1: Size:14.00GiB, Used:12.38GiB > /dev/sda 8.00GiB > /dev/sdi2 7.00GiB > /dev/sdj2 13.00GiB > > System,RAID1: Size:32.00MiB, Used:888.00KiB > /dev/sdi2 32.00MiB > /dev/sdj2 32.00MiB > > Unallocated: > /dev/sda 153.02GiB > /dev/sdi2 154.56GiB > /dev/sdj2 152.36GiB [Presumably this is a bit of btrfs filesystem show output, but the rest of it is missing...] > devid 2 size 3.64TiB used 3.49TiB path /dev/sda > devid 3 size 5.43TiB used 5.28TiB path /dev/sdi2 Based on the 100+ GiB still free on each of the three devices above, you should have no issues balancing after replacing one of them. Presumably the first time you tried it, there was far less, likely under a GiB free on the two not replaced. Since data chunks are nominally 1 GiB each and raid1 requires two copies, each on a different device, that didn't leave enough space on either of the older devices to do a balance, even tho there was plenty of space left on the just-replaced new one. (Tho multiple-GiB chunks are possible on TB+ devices, but 10 GiB free on each device should be plenty, so 100+ GiB free on each... should be no issues unless you run into some strange bug.) Meanwhile, even in the case of not enough space free on all three existing devices, given that they're currently two 4 TB devices and a 6 TB device and that you're replacing one of the 4 TB devices with an 8 TB device... Doing a two-step replace, first replacing the 6 TB device with the new 8 TB device, then resizing to the new 8 TB size, giving you ~2 TB of free space on it, then replacing one of the 4 TB devices with the now free 6 TB device, and again resizing to the new 6 TB size, giving you ~2 TB free on it too, thus giving you ~2 TB free on each of two devices instead of all 4 TB of new space on a single device, should do the trick very well, and should still be faster, probably MUCH faster, than doing a temporary convert to single, then back to raid1, the kludge you used last time. =:^) Meanwhile, while kernel version of course remains up to you, given that you mentioned 4.4 with a potential upgrade to 4.15, I will at least cover the following, so you'll have it to use as you decide on kernel versions. 4.15? Why? 4.14 is the current mainline LTS kernel series, with 4.15 only being a normal short-term stable series that has already been EOLed. So 4.15 now makes little sense at all. Either go current-stable series and do 4.16 and continue to upgrade as the new kernels come (4.17 should be out shortly as it's past rc6, with rc7 likely out by the time you read this and release likely in a week), or stick with 4.14 LTS for the longer-term support. Of course you can go with your distro kernel if you like, and I presume that's why you mentioned 4.15, but as I said it's already EOLed upstream, and of course this list being a kernel development list, our focus tends to be on upstream/mainstream, not distro level kernels. If you choose a distro level kernel series that's EOLed at kernel.org, then you really should be getting support from them for it, as they know what they've backported and/or patched and are thus best positioned to support it. As for what this list does try to support, it's the last two kernel release series in each of the current and LTS tracks. So as the first release back from current 4.16, 4.15, tho EOLed upstream, is still reasonably supported for the moment here, tho people should be upgrading to 4.16 by now as 4.17 should be out in a couple weeks or so and 4.15 would be out of the two-current-kernel-series window at that time. Meanwhile, the two latest LTS series are as already stated 4.14, and the earlier 4.9. 4.4 is the one previous to that and it's still mainline supported in general, but it's out of the two LTS-series window of best support here, and truth be told, based on history, even supporting the second newest LTS series starts to get more difficult at about a year and a half out, 6 months or so before the next LTS comes out. As it happens that's about where 4.9 is now, and 4.14 has had about 6 months to stabilize now, so for LTS I'd definitely recommend 4.14, now. Of course that doesn't mean that we /refuse/ to support 4.4, we still try, but it's out of primary focus now and in many cases, should you have problems, the first recommendation is going to be try something newer and see if the problem goes away or presents differently. Or as mentioned, check with your distro if it's a distro kernel, since in that case they're best positioned to support it. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 2:21 ` Brad Templeton 2018-05-27 5:55 ` Duncan @ 2018-05-27 18:22 ` Brad Templeton 2018-05-28 8:31 ` Duncan 1 sibling, 1 reply; 35+ messages in thread From: Brad Templeton @ 2018-05-27 18:22 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS BTW, I decided to follow the original double replace strategy suggested -- replace 6TB with 8TB and replace 4TB with 6TB. That should be sure to leave the 2 large drives each with 2TB free once expanded, and thus able to fully use all space. However, the first one has been going for 9 hours and is "189.7% done" and still going. Some sort of bug in calculating the completion status, obviously. With luck 200% will be enough? On Sat, May 26, 2018 at 7:21 PM, Brad Templeton <bradtem@gmail.com> wrote: > Certainly. My apologies for not including them before. As > described, the disks are reasonably balanced -- not as full as the > last time. As such, it might be enough that balance would (slowly) > free up enough chunks to get things going. And if I have to, I will > partially convert to single again. Certainly btrfs replace seems > like the most planned and simple path but it will result in a strange > distribution of the chunks. > > Label: 'butter' uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438 > Total devices 3 FS bytes used 6.11TiB > devid 1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall: > Device size: 12.70TiB > Device allocated: 12.25TiB > Device unallocated: 459.95GiB > Device missing: 0.00B > Used: 12.21TiB > Free (estimated): 246.35GiB (min: 246.35GiB) > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 1.32MiB) > > Data,RAID1: Size:6.11TiB, Used:6.09TiB > /dev/sda 3.48TiB > /dev/sdi2 5.28TiB > /dev/sdj2 3.46TiB > > Metadata,RAID1: Size:14.00GiB, Used:12.38GiB > /dev/sda 8.00GiB > /dev/sdi2 7.00GiB > /dev/sdj2 13.00GiB > > System,RAID1: Size:32.00MiB, Used:888.00KiB > /dev/sdi2 32.00MiB > /dev/sdj2 32.00MiB > > Unallocated: > /dev/sda 153.02GiB > /dev/sdi2 154.56GiB > /dev/sdj2 152.36GiB > > devid 2 size 3.64TiB used 3.49TiB path /dev/sda > devid 3 size 5.43TiB used 5.28TiB path /dev/sdi2 > > > On Sat, May 26, 2018 at 7:16 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> On 2018年05月27日 10:06, Brad Templeton wrote: >>> Thanks. These are all things which take substantial fractions of a >>> day to try, unfortunately. >> >> Normally I would suggest just using VM and several small disks (~10G), >> along with fallocate (the fastest way to use space) to get a basic view >> of the procedure. >> >>> Last time I ended up fixing it in a >>> fairly kluged way, which was to convert from raid-1 to single long >>> enough to get enough single blocks that when I converted back to >>> raid-1 they got distributed to the right drives. >> >> Yep, that's the ultimate one-fit-all solution. >> Also, this reminds me about the fact we could do the >> RAID1->Single/DUP->Single downgrade in a much much faster way. >> I think it's worthy considering for later enhancement. >> >>> But this is, aside >>> from being a kludge, a procedure with some minor risk. Of course I am >>> taking a backup first, but still... >>> >>> This strikes me as something that should be a fairly common event -- >>> your raid is filling up, and so you expand it by replacing the oldest >>> and smallest drive with a new much bigger one. In the old days of >>> RAID, you could not do that, you had to grow all drives at the same >>> time, and this is one of the ways that BTRFS is quite superior. >>> When I had MD raid, I went through a strange process of always having >>> a raid 5 that consisted of different sized drives. The raid-5 was >>> based on the smallest of the 3 drives, and then the larger ones had >>> extra space which could either be in raid-1, or more imply was in solo >>> disk mode and used for less critical data (such as backups and old >>> archives.) Slowly, and in a messy way, each time I replaced the >>> smallest drive, I could then grow the raid 5. Yuck. BTRFS is so >>> much better, except for this issue. >>> >>> So if somebody has a thought of a procedure that is fairly sure to >>> work and doesn't involve too many copying passes -- copying 4tb is not >>> a quick operation -- it is much appreciated and might be a good thing >>> to add to a wiki page, which I would be happy to do. >> >> Anyway, "btrfs fi show" and "btrfs fi usage" would help before any >> further advice from community. >> >> Thanks, >> Qu >> >>> >>> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>> >>>> >>>> On 2018年05月27日 09:49, Brad Templeton wrote: >>>>> That is what did not work last time. >>>>> >>>>> I say I think there can be a "fix" because I hope the goal of BTRFS >>>>> raid is to be superior to traditional RAID. That if one replaces a >>>>> drive, and asks to balance, it figures out what needs to be done to >>>>> make that work. I understand that the current balance algorithm may >>>>> have trouble with that. In this situation, the ideal result would be >>>>> the system would take the 3 drives (4TB and 6TB full, 8TB with 4TB >>>>> free) and move extents strictly from the 4TB and 6TB to the 8TB -- ie >>>>> extents which are currently on both the 4TB and 6TB -- by moving only >>>>> one copy. >>>> >>>> Btrfs can only do balance in a chunk unit. >>>> Thus btrfs can only do: >>>> 1) Create new chunk >>>> 2) Copy data >>>> 3) Remove old chunk. >>>> >>>> So it can't do the way you mentioned. >>>> But your purpose sounds pretty valid and maybe we could enhanace btrfs >>>> to do such thing. >>>> (Currently only replace can behave like that) >>>> >>>>> It is not strictly a "bug" in that the code is operating >>>>> as designed, but it is an undesired function. >>>>> >>>>> The problem is the approach you describe did not work in the prior upgrade. >>>> >>>> Would you please try 4/4/6 + 4 or 4/4/6 + 2 and then balance? >>>> And before/after balance, "btrfs fi usage" and "btrfs fi show" output >>>> could also help. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> On Sat, May 26, 2018 at 6:41 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>>> >>>>>> >>>>>> On 2018年05月27日 09:27, Brad Templeton wrote: >>>>>>> A few years ago, I encountered an issue (halfway between a bug and a >>>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was >>>>>>> fairly full. The problem was that after replacing (by add/delete) a >>>>>>> small drive with a larger one, there were now 2 full drives and one >>>>>>> new half-full one, and balance was not able to correct this situation >>>>>>> to produce the desired result, which is 3 drives, each with a roughly >>>>>>> even amount of free space. It can't do it because the 2 smaller >>>>>>> drives are full, and it doesn't realize it could just move one of the >>>>>>> copies of a block off the smaller drive onto the larger drive to free >>>>>>> space on the smaller drive, it wants to move them both, and there is >>>>>>> nowhere to put them both. >>>>>> >>>>>> It's not that easy. >>>>>> For balance, btrfs must first find a large enough space to locate both >>>>>> copy, then copy data. >>>>>> Or if powerloss happens, it will cause data corruption. >>>>>> >>>>>> So in your case, btrfs can only find enough space for one copy, thus >>>>>> unable to relocate any chunk. >>>>>> >>>>>>> >>>>>>> I'm about to do it again, taking my nearly full array which is 4TB, >>>>>>> 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to >>>>>>> repeat the very time consuming situation, so I wanted to find out if >>>>>>> things were fixed now. I am running Xenial (kernel 4.4.0) and could >>>>>>> consider the upgrade to bionic (4.15) though that adds a lot more to >>>>>>> my plate before a long trip and I would prefer to avoid if I can. >>>>>> >>>>>> Since there is nothing to fix, the behavior will not change at all. >>>>>> >>>>>>> >>>>>>> So what is the best strategy: >>>>>>> >>>>>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) >>>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks >>>>>>> from 4TB but possibly not enough) >>>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with >>>>>>> recently vacated 6TB -- much longer procedure but possibly better >>>>>>> >>>>>>> Or has this all been fixed and method A will work fine and get to the >>>>>>> ideal goal -- 3 drives, with available space suitably distributed to >>>>>>> allow full utilization over time? >>>>>> >>>>>> Btrfs chunk allocator is already trying to utilize all drivers for a >>>>>> long long time. >>>>>> When allocate chunks, btrfs will choose the device with the most free >>>>>> space. However the nature of RAID1 needs btrfs to allocate extents from >>>>>> 2 different devices, which makes your replaced 4/4/6 a little complex. >>>>>> (If your 4/4/6 array is set up and then filled to current stage, btrfs >>>>>> should be able to utilize all the space) >>>>>> >>>>>> >>>>>> Personally speaking, if you're confident enough, just add a new device, >>>>>> and then do balance. >>>>>> If enough chunks get balanced, there should be enough space freed on >>>>>> existing disks. >>>>>> Then remove the newly added device, then btrfs should handle the >>>>>> remaining space well. >>>>>> >>>>>> Thanks, >>>>>> Qu >>>>>> >>>>>>> >>>>>>> On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: >>>>>>>> A few years ago, I encountered an issue (halfway between a bug and a >>>>>>>> problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly >>>>>>>> full. The problem was that after replacing (by add/delete) a small drive >>>>>>>> with a larger one, there were now 2 full drives and one new half-full one, >>>>>>>> and balance was not able to correct this situation to produce the desired >>>>>>>> result, which is 3 drives, each with a roughly even amount of free space. >>>>>>>> It can't do it because the 2 smaller drives are full, and it doesn't realize >>>>>>>> it could just move one of the copies of a block off the smaller drive onto >>>>>>>> the larger drive to free space on the smaller drive, it wants to move them >>>>>>>> both, and there is nowhere to put them both. >>>>>>>> >>>>>>>> I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB >>>>>>>> and replacing one of the 4TB with an 8TB. I don't want to repeat the very >>>>>>>> time consuming situation, so I wanted to find out if things were fixed now. >>>>>>>> I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic >>>>>>>> (4.15) though that adds a lot more to my plate before a long trip and I >>>>>>>> would prefer to avoid if I can. >>>>>>>> >>>>>>>> So what is the best strategy: >>>>>>>> >>>>>>>> a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" >>>>>>>> strategy) >>>>>>>> b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from >>>>>>>> 4TB but possibly not enough) >>>>>>>> c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently >>>>>>>> vacated 6TB -- much longer procedure but possibly better >>>>>>>> >>>>>>>> Or has this all been fixed and method A will work fine and get to the ideal >>>>>>>> goal -- 3 drives, with available space suitably distributed to allow full >>>>>>>> utilization over time? >>>>>>>> >>>>>>>> On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: >>>>>>>>> >>>>>>>>> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist >>>>>>>>> <patrik.lundquist@gmail.com> wrote: >>>>>>>>>> On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I am surprised to hear it said that having the mixed sizes is an odd >>>>>>>>>>>> case. >>>>>>>>>>> >>>>>>>>>>> Not odd as in wrong, just uncommon compared to other arrangements being >>>>>>>>>>> tested. >>>>>>>>>> >>>>>>>>>> I think mixed drive sizes in raid1 is a killer feature for a home NAS, >>>>>>>>>> where you replace an old smaller drive with the latest and largest >>>>>>>>>> when you need more storage. >>>>>>>>>> >>>>>>>>>> My raid1 currently consists of 6TB+3TB+3*2TB. >>>>>>>>> >>>>>>>>> For the original OP situation, with chunks all filled op with extents >>>>>>>>> and devices all filled up with chunks, 'integrating' a new 6TB drive >>>>>>>>> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual >>>>>>>>> way in order to avoid immediate balancing needs: >>>>>>>>> - 'plug-in' the 6TB >>>>>>>>> - btrfs-replace 4TB by 6TB >>>>>>>>> - btrfs fi resize max 6TB_devID >>>>>>>>> - btrfs-replace 2TB by 4TB >>>>>>>>> - btrfs fi resize max 4TB_devID >>>>>>>>> - 'unplug' the 2TB >>>>>>>>> >>>>>>>>> So then there would be 2 devices with roughly 2TB space available, so >>>>>>>>> good for continued btrfs raid1 writes. >>>>>>>>> >>>>>>>>> An offline variant with dd instead of btrfs-replace could also be done >>>>>>>>> (I used to do that sometimes when btrfs-replace was not implemented). >>>>>>>>> My experience is that btrfs-replace speed is roughly at max speed (so >>>>>>>>> harddisk magnetic media transferspeed) during the whole replace >>>>>>>>> process and it does in a more direct way what you actually want. So in >>>>>>>>> total mostly way faster device replace/upgrade than with the >>>>>>>>> add+delete method. And raid1 redundancy is active all the time. Of >>>>>>>>> course it means first make sure the system runs up-to-date/latest >>>>>>>>> kernel+tools. >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 18:22 ` Brad Templeton @ 2018-05-28 8:31 ` Duncan 0 siblings, 0 replies; 35+ messages in thread From: Duncan @ 2018-05-28 8:31 UTC (permalink / raw) To: linux-btrfs Brad Templeton posted on Sun, 27 May 2018 11:22:07 -0700 as excerpted: > BTW, I decided to follow the original double replace strategy suggested -- > replace 6TB with 8TB and replace 4TB with 6TB. That should be sure to > leave the 2 large drives each with 2TB free once expanded, and thus able > to fully use all space. > > However, the first one has been going for 9 hours and is "189.7% done" > and still going. Some sort of bug in calculating the completion > status, obviously. With luck 200% will be enough? IIRC there was an over-100% completion status bug fixed, I'd guess about 18 months to two years ago now, long enough it would have slipped regular's minds so nobody would have thought about it even knowing you're still on 4.4, that being one of the reasons we don't do as well supporting stuff that old. If it is indeed the same bug, anything even half modern should have it fixed -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2018-05-27 1:27 ` Brad Templeton 2018-05-27 1:41 ` Qu Wenruo @ 2018-06-08 3:23 ` Zygo Blaxell 1 sibling, 0 replies; 35+ messages in thread From: Zygo Blaxell @ 2018-06-08 3:23 UTC (permalink / raw) To: Brad Templeton; +Cc: Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 7601 bytes --] On Sat, May 26, 2018 at 06:27:57PM -0700, Brad Templeton wrote: > A few years ago, I encountered an issue (halfway between a bug and a > problem) with attempting to grow a BTRFS 3 disk Raid 1 which was > fairly full. The problem was that after replacing (by add/delete) a > small drive with a larger one, there were now 2 full drives and one > new half-full one, and balance was not able to correct this situation > to produce the desired result, which is 3 drives, each with a roughly > even amount of free space. It can't do it because the 2 smaller > drives are full, and it doesn't realize it could just move one of the > copies of a block off the smaller drive onto the larger drive to free > space on the smaller drive, it wants to move them both, and there is > nowhere to put them both. > > I'm about to do it again, taking my nearly full array which is 4TB, > 4TB, 6TB and replacing one of the 4TB with an 8TB. I don't want to > repeat the very time consuming situation, so I wanted to find out if > things were fixed now. I am running Xenial (kernel 4.4.0) and could > consider the upgrade to bionic (4.15) though that adds a lot more to > my plate before a long trip and I would prefer to avoid if I can. > > So what is the best strategy: > > a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" strategy) > b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks > from 4TB but possibly not enough) > c) Replace 6TB with 8TB, resize/balance, then replace 4TB with > recently vacated 6TB -- much longer procedure but possibly better d) Run "btrfs balance start -dlimit=3 /fs" to make some unallocated space on all drives *before* adding disks. Then replace, resize up, and balance until unallocated space on all disks are equal. There is no need to continue balancing after that, so once that point is reached you can cancel the balance. A number of bad things can happen when unallocated space goes to zero, and being unable to expand a raid1 array is only one of them. Avoid that situation even when not resizing the array, because some cases can be very difficult to get out of. Assuming your disk is not filled to the last gigabyte, you'll be able to keep at least 1GB unallocated on every disk at all times. Monitor the amount of unallocated space and balance a few data block groups (e.g. -dlimit=3) whenever unallocated space gets low. A potential btrfs enhancement area: allow the 'devid' parameter of balance to specify two disks to balance block groups that contain chunks on both disks. We want to balance only those block groups that consist of one chunk on each smaller drive. This redistributes those block groups to have one chunk on the large disk and one chunk on one of the smaller disks, freeing space on the other small disk for the next block group. Block groups that consist of a chunk on the big disk and one of the small disks are already in the desired configuration, so rebalancing them is just a waste of time. Currently it's only possible to do this by writing a script to select individual block groups with python-btrfs or similar--much faster than plain btrfs balance for this case, but more involved to set up. > Or has this all been fixed and method A will work fine and get to the > ideal goal -- 3 drives, with available space suitably distributed to > allow full utilization over time? > > On Sat, May 26, 2018 at 6:24 PM, Brad Templeton <bradtem@gmail.com> wrote: > > A few years ago, I encountered an issue (halfway between a bug and a > > problem) with attempting to grow a BTRFS 3 disk Raid 1 which was fairly > > full. The problem was that after replacing (by add/delete) a small drive > > with a larger one, there were now 2 full drives and one new half-full one, > > and balance was not able to correct this situation to produce the desired > > result, which is 3 drives, each with a roughly even amount of free space. > > It can't do it because the 2 smaller drives are full, and it doesn't realize > > it could just move one of the copies of a block off the smaller drive onto > > the larger drive to free space on the smaller drive, it wants to move them > > both, and there is nowhere to put them both. > > > > I'm about to do it again, taking my nearly full array which is 4TB, 4TB, 6TB > > and replacing one of the 4TB with an 8TB. I don't want to repeat the very > > time consuming situation, so I wanted to find out if things were fixed now. > > I am running Xenial (kernel 4.4.0) and could consider the upgrade to bionic > > (4.15) though that adds a lot more to my plate before a long trip and I > > would prefer to avoid if I can. > > > > So what is the best strategy: > > > > a) Replace 4TB with 8TB, resize up and balance? (This is the "basic" > > strategy) > > b) Add 8TB, balance, remove 4TB (automatic distribution of some blocks from > > 4TB but possibly not enough) > > c) Replace 6TB with 8TB, resize/balance, then replace 4TB with recently > > vacated 6TB -- much longer procedure but possibly better > > > > Or has this all been fixed and method A will work fine and get to the ideal > > goal -- 3 drives, with available space suitably distributed to allow full > > utilization over time? > > > > On Fri, Mar 25, 2016 at 7:35 AM, Henk Slager <eye1tm@gmail.com> wrote: > >> > >> On Fri, Mar 25, 2016 at 2:16 PM, Patrik Lundquist > >> <patrik.lundquist@gmail.com> wrote: > >> > On 23 March 2016 at 20:33, Chris Murphy <lists@colorremedies.com> wrote: > >> >> > >> >> On Wed, Mar 23, 2016 at 1:10 PM, Brad Templeton <bradtem@gmail.com> > >> >> wrote: > >> >> > > >> >> > I am surprised to hear it said that having the mixed sizes is an odd > >> >> > case. > >> >> > >> >> Not odd as in wrong, just uncommon compared to other arrangements being > >> >> tested. > >> > > >> > I think mixed drive sizes in raid1 is a killer feature for a home NAS, > >> > where you replace an old smaller drive with the latest and largest > >> > when you need more storage. > >> > > >> > My raid1 currently consists of 6TB+3TB+3*2TB. > >> > >> For the original OP situation, with chunks all filled op with extents > >> and devices all filled up with chunks, 'integrating' a new 6TB drive > >> in an 4TB+3TG+2TB raid1 array could probably be done in a bit unusual > >> way in order to avoid immediate balancing needs: > >> - 'plug-in' the 6TB > >> - btrfs-replace 4TB by 6TB > >> - btrfs fi resize max 6TB_devID > >> - btrfs-replace 2TB by 4TB > >> - btrfs fi resize max 4TB_devID > >> - 'unplug' the 2TB > >> > >> So then there would be 2 devices with roughly 2TB space available, so > >> good for continued btrfs raid1 writes. > >> > >> An offline variant with dd instead of btrfs-replace could also be done > >> (I used to do that sometimes when btrfs-replace was not implemented). > >> My experience is that btrfs-replace speed is roughly at max speed (so > >> harddisk magnetic media transferspeed) during the whole replace > >> process and it does in a more direct way what you actually want. So in > >> total mostly way faster device replace/upgrade than with the > >> add+delete method. And raid1 redundancy is active all the time. Of > >> course it means first make sure the system runs up-to-date/latest > >> kernel+tools. > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-25 13:16 ` Patrik Lundquist 2016-03-25 14:35 ` Henk Slager @ 2016-03-27 4:23 ` Brad Templeton 1 sibling, 0 replies; 35+ messages in thread From: Brad Templeton @ 2016-03-27 4:23 UTC (permalink / raw) Cc: Btrfs BTRFS For those curious as the the result, the reduction to single and restoration to RAID1 did indeed balance the array. It was extremely slow of course on a 12tb array. I did not bother doing this with the metadata. I also stopped the conversion to single when it had freed up enough space on the 2 smaller drives, because at that time it was moving stuff into the big drive, which seemed sub-optimal considering what was to come. In general, obviously, I hope the long term goal is to not need this, indeed not to need manual balance at all. I would hope the goal is to just be able to add and remove drives, tell the system what type of redundancy you need and let it figure out the rest. But I know this is an FS in development. I've actually come to feel that when it comes to personal drive arrays, we actually need something much smarter than today's filesystems. Truth is, for example, that once my infrequently accessed files, such as old photo and video archives, have a solid backup made, there is not actually a need to keep them redundantly at all, except for speed, while the much smaller volume of frequently accessed files needs that (or even extra redundancy not for safety but extra speed, and of course cache on an SSD is even better.) This requires not just the fileystem and OS to get smarter about this, but even the apps. It may happen some day -- no matter how cheap storage gets, we keep coming up with ways to fill it. Thanks for the help. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 19:10 ` Brad Templeton 2016-03-23 19:27 ` Alexander Fougner 2016-03-23 19:33 ` Chris Murphy @ 2016-03-23 21:54 ` Duncan 2 siblings, 0 replies; 35+ messages in thread From: Duncan @ 2016-03-23 21:54 UTC (permalink / raw) To: linux-btrfs Brad Templeton posted on Wed, 23 Mar 2016 12:10:29 -0700 as excerpted: > It is Ubuntu wily, which is 4.2 and btrfs-progs 0.4. Presumably that's a type for btrfs-progs. Either that or Ubuntu's using a versioning that's totally different than upstream btrfs. For some time now (since the 3.12 release, ancient history in btrfs terms), btrfs-progs has been release version synced with the kernel. So the latest release is 4.5.0, to match the kernel 4.5.0 that came out shortly before that userspace release and that was developed at the same time. Before that was 4.4.1, a primarily bugfix release to the previous 4.4.0. Before 3.12, the previous actual userspace release, extremely stale by that point, was 0.19, tho there was a 0.20-rc1 release, that wasn't followed up with a 0.20 full release. The recommendation back then was to run and for distros to ship git snapshots. So where 0.4 came from I've not the foggiest, unless as I said it's a typo, perhaps for 4.0. > I will upgrade to > Xenial in April but probably not before, I don't have days to spend on > this. Is there a fairly safe ppa to pull 4.4 or 4.5? In olden days, I > would patch and build my kernels from source but I just don't have time > for all the long-term sysadmin burden that creates any more. Heh, this posting is from a gentooer, who builds /everything/ from sources. =:^) Tho that's not really a problem as it can go on in the background and thus takes little actual attention time. The real time is in figuring out what I need to know about what has changed between versions and if/how that needs to affect my existing config, but that's time that needs spent regardless of the distro, the major question being one of rolling distro and thus spending that time a bit here and a bit there as the various components upgrade, with a better chance of actually nailing down the problem to a specific package upgrade when there's issues, or doing it all in one huge version upgrade, which pretty much leaves you high and dry in terms of fixing problems since the entire world changes at once and it's thus nearly impossible to pin a bug to a particular package upgrade. But meanwhile, as CMurphy says at the expense of a frowny face... Given that btrfs is still maturing, and /not/ yet entirely stable and mature, and the fact that the list emphasis is on mainline, the list kernel recommendation is to follow one of two tracks, either mainline current, or mainline LTS. If you choose the mainline current track, the recommendation is to stay within the latest two current kernel series. With 4.5 out, that means you should be on 4.4 at least, Previous non-LTS kernel series no longer get patch backports at least from mainline, and as we focus on mainline here, we're not tracking what distros may or may not backport on their own, so we simply can't provide the same level of support. For LTS kernel track, the recommendation has recently relaxed slightly. Previously, it was again to stick with the latest two kernel LTS series, which would be 4.4 and 4.1. However, the one previous to that was 3.18, and it has been reasonably stable, certainly more so that those previous to that, so while 4.1 or 4.4 is still what we really like to see, we recognize that some will be sticking to 3.18 and are continuing to try to support them as well, now that the LTS 4.4 has pushed it out of the primary recommended range. But previous to that really isn't supported. Not that we won't do best-effort, regardless, but in many instances, the best recommendation we can make with out-of-support kernels really is to upgrade to something more current, and try again. Meanwhile, yes, we do recognize that distros have chosen to support btrfs on kernels outside that list. But as I said, we don't track what patches the distros may or may not have backported, and thus aren't in a particularly good position to provide support for them. The distros themselves, having chosen to provide that support, are in a far better position to do just that, since they know what they've backported and what they haven't. So in that case, the best we can do is refer you to the distros whose support you are nominally relying on, to actually provide that support. And obviously, kernel 4.2 isn't one of the ones named above. It's neither a mainstream LTS, nor any longer within the last two current kernel releases. So kernel upgrade, however you choose to do it, is strongly recommended, with two other alternatives if you prefer: 1) Ask your distro for support of versions off the mainline support list. After all, they're the ones claiming to support the known to be not entirely stabilized and ready for production use btrfs on non- mainline-LTS kernels long after mainline support for those non-LTS kernels has been dropped. 2) Choose a filesystem that better matches your needs, presumably because it /is/ fully mature and stable, and thus is properly supported on older kernels outside the relatively narrow range of btrfs-list recommended kernels. As for userspace, as explained above, in most cases for online and generally operational btrfs, it's the kernel code that counts. Userspace is important in three cases, however, (1) when you're first creating the filesystem (mkfs.btrfs), (2) if you need relatively new features that older userspace doesn't have the kernelcode calls to support, and (3) when the filesystem has problems and you're trying to fix them with btrfs check and the other offline tools, or you're simply trying to get what you can off the (presumably unmountable) filesystem using btrfs restore, before giving up on it entirely. So for normal use, btrfs userspace version isn't as critical, until it gets so old translating from newer call syntax to older syntax, or between output formats, becomes a problem. But once your btrfs won't mount properly and you're either trying to fix them or recover files off them, /then/ userspace becomes critical, as the newer versions can deal with more problems than older versions can. Meanwhile, newer btrfs-progs userspace is always designed to be able to handle older kernels as well. So a good rule of thumb for userspace is to run at least the latest userspace release from the series matching your kernel version (with the short period after kernel release before the corresponding userspace release excepted, of course, if you're running /that/ close to current). As long as you stay within kernel recommendations, that will keep your userspace within reason as well. So a 4.2 kernel isn't supported (on list, but you can of course refer to your distro instead, if they support it) as it's out of the current kernel support range and isn't an LTS, and upgrading to 4.4 LTS is recommended. Alternatively, you may wish to downgrade to kernel 4.1, which is actually an LTS kernel and remains well supported as such. And once you're running a supported kernel, ensure that your btrfs-progs is the latest of that userspace series, or newer, and you should be good to go. =:^) Again, with the alternatives being either getting support from your distro if they're supporting btrfs on versions outside of those supported on-list, or switching to a filesystem that better matches your use-case in terms of stability and longer term support. > Also, I presume if this is a bug, it's in btrfsprogs, though the new one > presumably needs a newer kernel too. Balance, like most "online" btrfs code, is primarily kernel. All userspace does is call the appropriate kernel code to do the actual work. So the problem here is almost certainly kernel. Meanwhile, I have an idea of what /might/ be your balance problem, but I want to cover it in a separate reply. Suffice it to say here that the news isn't great, if this is your issue, as it's a known but somewhat rare problem that has yet to be properly traced down and fixed. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 18:34 ` Chris Murphy 2016-03-23 19:10 ` Brad Templeton @ 2016-03-23 22:28 ` Duncan 2016-03-24 7:08 ` Andrew Vaughan 2 siblings, 0 replies; 35+ messages in thread From: Duncan @ 2016-03-23 22:28 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Wed, 23 Mar 2016 12:34:10 -0600 as excerpted: > On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton <bradtem@gmail.com> > wrote: >> Thanks for assist. To reiterate what I said in private: >> >> a) I am fairly sure I swapped drives by adding the 6TB drive and then >> removing the 2TB drive, which would not have made the 6TB think it was >> only 2TB. The btrfs statistics commands have shown from the >> beginning the size of the device as 6TB, and that after the remove, it >> haad 4TB unallocated. > > I agree this seems to be consistent with what's been reported. Chris, and Hugo too as the one with the most experience with this, on IRC and privately as well as on-list. Is this possibly another instance of that persistent mystery bug where btrfs pretty much refuses to allocate new chunks despite there being all sorts of room for it to do so, that seems just rare enough that without any known method of replication, keeps getting backburnered by more urgent-issues when devs try to properly investigate and trace it down, while being persistent over many kernels now and just common enough, with just enough common characteristics among those affected, to be considered a single, now recognized, bug? If it's the same bug here, it seems to be affecting only the new 6 TB device, not the older and smaller devices, but I'm not sure if it has manifested in that sort of device-exclusive form before, or not, and that along with the facts that there's no fix known and that Hugo seems to be the only one with enough experience with the bug to actually reasonably authoritatively consider it the same bug, has me reluctant to actually label it as such here. But I can certainly ask the question, and I've not yet seen it suggested as the ultimate bug we're facing thin this thread yet, so... If Hugo (or Chris if he's seen enough more instances of this bug recently to reasonably reliably say) doesn't post something more authoritative... If this is indeed /that/ bug, then most efforts to fix it, won't directly fix it at all. Rebalancing to single, and then back to raid1, /might/ eliminate it... or not, I simply don't have enough experience troubleshooting this bug to know if others tried that and their results or not (tho I'd guess Hugo would have suggested that, where people weren't dealing with a single-device-only, anyway, and might know the results). The one known way to eliminate the bug is to back everything up, blow away the filesystem and recreate it. Tho AFAIK, in one instance at least, the new btrfs ended up having the same bug. But I believe for most, it does get rid of it. Luckily in the OP's case, the filesystem has evolved over time, so chances are that the bug won't appear on the new btrfs, created from the start with all the devices intended for it currently. It /might/ reappear with time, but I'd hope it'd only appear sometime later, after another device upgrade or two, at least. Of course, that's assuming it's either this bug, or another one that's fixed by starting over with newly created filesystem with all currently intended devices included in the mkfs.btrfs. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RAID-1 refuses to balance large drive 2016-03-23 18:34 ` Chris Murphy 2016-03-23 19:10 ` Brad Templeton 2016-03-23 22:28 ` Duncan @ 2016-03-24 7:08 ` Andrew Vaughan 2 siblings, 0 replies; 35+ messages in thread From: Andrew Vaughan @ 2016-03-24 7:08 UTC (permalink / raw) To: Chris Murphy; +Cc: Brad Templeton, Btrfs BTRFS Hi Brad Just a user here, not a dev. I think I might have run into a similar bug about 6 months ago. At the time I was running Debian stable. (iirc that is kernel 3.16 and probably btrfs-progs of a similar vintage). The filesystem was originally a 2 x 6TB array with a 4TB drive added later when space began to get low. I'm pretty sure I must have done at least a partial balance after adding the 4TB drive, but something like 1TB free on each of the two 6GB drives, and 2GB on the 4TB would have been 'good enough for me'. It was nearly full again when a copy unexpectedly reported out-of-space. Balance didn't fix it. In retrospect btrfs had probably run out of chunks on both 6TB drives. I'm not sure what actually fixed it. I upgraded to Debian testing (something I was going to do soon anyway). I might have also temporarily added another drive. (I have since had a 6TB drive fail, and btrfs is running happily on 2x4TB, and 1x6TB). More inline below. On 24 March 2016 at 05:34, Chris Murphy <lists@colorremedies.com> wrote: > On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton <bradtem@gmail.com> wrote: >> Thanks for assist. To reiterate what I said in private: >> >> a) I am fairly sure I swapped drives by adding the 6TB drive and then >> removing the 2TB drive, which would not have made the 6TB think it was >> only 2TB. The btrfs statistics commands have shown from the beginning >> the size of the device as 6TB, and that after the remove, it haad 4TB >> unallocated. > > I agree this seems to be consistent with what's been reported. > <snip> >> >> Some options remaining open to me: >> >> a) I could re-add the 2TB device, which is still there. Then balance >> again, which hopefully would move a lot of stuff. Then remove it again >> and hopefully the new stuff would distribute mostly to the large drive. >> Then I could try balance again. > > Yeah, to do this will require -f to wipe the signature info from that > drive when you add it. But I don't think this is a case of needing > more free space, I think it might be due to the odd number of drives > that are also fairly different in size. > If I recall correctly, when I did a device delete, I thought device delete did remove the btrfs signature. But I could be wrong > But then what happens when you delete the 2TB drive after the balance? > Do you end up right back in this same situation? > If balance manages to get the data properly distributed across the drives, then the 2TB should be mostly empty, and device delete should be able to remove the 2TB disk. I successfully added a 4TB disk, did a balance, and then removed a failing 6TB from the 3 drive array above. > >> >> b) It was suggested I could (with a good backup) convert the drive to >> non-RAID1 to free up tons of space and then re-convert. What's the >> precise procedure for that? Perhaps I can do it with a limit to see how >> it works as an experiment? Any way to specifically target the blocks >> that have their two copies on the 2 smaller drives for conversion? > > btrfs balance -dconvert=single -mconvert=single -f ## you have to > use -f to force reduction in redundancy > btrfs balance -dconvert=raid1 -mconvert=raid1 I would probably try upgrading to a newer kernel + btrfs-progs first. Before converting back to raid1, I would also run btrfs device usage and check to see whether the all devices have approximately the same amount of unallocated space. If they don't, maybe try running a full balance again. <snip> Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2018-06-08 3:24 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-23 0:47 RAID-1 refuses to balance large drive Brad Templeton
2016-03-23 4:01 ` Qu Wenruo
2016-03-23 4:47 ` Brad Templeton
2016-03-23 5:42 ` Chris Murphy
[not found] ` <56F22F80.501@gmail.com>
2016-03-23 6:17 ` Chris Murphy
2016-03-23 16:51 ` Brad Templeton
2016-03-23 18:34 ` Chris Murphy
2016-03-23 19:10 ` Brad Templeton
2016-03-23 19:27 ` Alexander Fougner
2016-03-23 19:33 ` Chris Murphy
2016-03-24 1:59 ` Qu Wenruo
2016-03-24 2:13 ` Brad Templeton
2016-03-24 2:33 ` Qu Wenruo
2016-03-24 2:49 ` Brad Templeton
2016-03-24 3:44 ` Chris Murphy
2016-03-24 3:46 ` Qu Wenruo
2016-03-24 6:11 ` Duncan
2016-03-25 13:16 ` Patrik Lundquist
2016-03-25 14:35 ` Henk Slager
2016-03-26 4:15 ` Duncan
[not found] ` <CAHz9+Emc4DsXoMLKYrp1TfN+2r2cXxaJmPyTnpeCZF=h0FhtMg@mail.gmail.com>
2018-05-27 1:27 ` Brad Templeton
2018-05-27 1:41 ` Qu Wenruo
2018-05-27 1:49 ` Brad Templeton
2018-05-27 1:56 ` Qu Wenruo
2018-05-27 2:06 ` Brad Templeton
2018-05-27 2:16 ` Qu Wenruo
2018-05-27 2:21 ` Brad Templeton
2018-05-27 5:55 ` Duncan
2018-05-27 18:22 ` Brad Templeton
2018-05-28 8:31 ` Duncan
2018-06-08 3:23 ` Zygo Blaxell
2016-03-27 4:23 ` Brad Templeton
2016-03-23 21:54 ` Duncan
2016-03-23 22:28 ` Duncan
2016-03-24 7:08 ` Andrew Vaughan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).