From: Andrei Borzenkov <arvidjaar@gmail.com>
To: "Hans van Kranenburg" <hans@knorrie.org>,
"Zygo Blaxell" <ce3g8jdj@umail.furryterror.org>,
"Jakub Husák" <jakub@husak.pro>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Balancing raid5 after adding another disk does not move/use any data on it
Date: Sat, 16 Mar 2019 09:07:17 +0300 [thread overview]
Message-ID: <aa4fc47e-2eee-de11-c73f-00d947f96dbf@gmail.com> (raw)
In-Reply-To: <3dce71e1-6caa-59ad-1765-6a29c7dd774f@knorrie.org>
15.03.2019 23:31, Hans van Kranenburg пишет:
...
>>
>>>> If so, shouldn't it be really balancing (spreading) the data among all
>>>> the drives to use all the IOPS capacity, even when the raid5 redundancy
>>>> constraint is currently satisfied?
>>
>> btrfs divides the disks into chunks first, then spreads the data across
>> the chunks. The chunk allocation behavior spreads chunks across all the
>> disks. When you are adding a disk to raid5, you have to redistribute all
>> the old data across all the disks to get balanced IOPS and space usage,
>> hence the full balance requirement.
>>
>> If you don't do a full balance, it will eventually allocate data on
>> all disks, but it will run out of space on sdb, sdc, and sde first,
>> and then be unable to use the remaining 2TB+ on sdd.
>
> Also, if you have a lot of empty space in the current allocations, btrfs
> balance has the tendency to first start packing everything together
> before allocating new (4 disk wide) block groups.
>
> This is annoying, because it can result in moving the same data multiple
> times during balance (into empty space of another existing block group,
> and then when that one has its turn again etc).
> > So you want to get rid of empty space in existing block groups as soon
> as possible. btrfs-balance-least-used can do this, (also an example from
> python-btrfs), by doing them in order of most empty one first.
>
But if I understand the above correctly it will still attempt to move
data in next most empty chunks first. Is there any way to force
allocation of new chunks? Or, better, force usage of chunks with given
stripe width as balance target?
This thread actually made me wonder - is there any guarantee (or even
tentative promise) about RAID stripe width from btrfs at all? Is it
possible that RAID5 degrades to mirror by itself due to unfortunate
space distribution?
next prev parent reply other threads:[~2019-03-16 6:07 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-13 22:11 Balancing raid5 after adding another disk does not move/use any data on it Jakub Husák
2019-03-14 14:59 ` Noah Massey
2019-03-14 15:08 ` Noah Massey
2019-03-15 18:01 ` Zygo Blaxell
2019-03-15 18:42 ` Jakub Husák
2019-03-15 18:59 ` Zygo Blaxell
2019-03-15 20:31 ` Hans van Kranenburg
2019-03-16 6:07 ` Andrei Borzenkov [this message]
2019-03-16 16:34 ` Hans van Kranenburg
2019-03-16 19:51 ` Hans van Kranenburg
2019-03-17 20:52 ` Jakub Husák
2019-03-17 22:53 ` Hans van Kranenburg
2019-03-18 19:54 ` Marc Joliet
2019-03-16 23:10 ` Zygo Blaxell
-- strict thread matches above, loose matches on Subject: below --
2019-03-13 21:58 Jakub Husák
2019-03-14 21:31 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa4fc47e-2eee-de11-c73f-00d947f96dbf@gmail.com \
--to=arvidjaar@gmail.com \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=hans@knorrie.org \
--cc=jakub@husak.pro \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox