Re: Better distribution of RAID1 data?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Brian B <brian@sd85.net>, linux-btrfs@vger.kernel.org
Subject: Re: Better distribution of RAID1 data?
Date: Fri, 15 Feb 2019 11:54:57 -0500	[thread overview]
Message-ID: <91c2c290-5796-3f18-804e-0c19ae17f1db@gmail.com> (raw)
In-Reply-To: <db678e75-6f56-a247-adb9-c7cca4d63528@sd85.net>

On 2019-02-15 10:40, Brian B wrote:
> It looks like the btrfs code currently uses the total space available on
> a disk to determine where it should place the two copies of a file in
> RAID1 mode.  Wouldn't it make more sense to use the _percentage_ of free
> space instead of the number of free bytes?
> 
> For example, I have two disks in my array that are 8 TB, plus an
> assortment of 3,4, and 1 TB disks.  With the current allocation code,
> btrfs will use my two 8 TB drives exclusively until I've written 4 TB of
> files, then it will start using the 4 TB disks, then eventually the 3,
> and finally the 1 TB disks.  If the code used a percentage figure
> instead, it would spread the allocations much more evenly across the
> drives, ideally spreading load and reducing drive wear.
> 
> Is there a reason this is done this way, or is it just something that
> hasn't had time for development?
It's simple to implement, easy to verify, runs fast, produces optimal or 
near optimal space usage in pretty much all cases, and is highly 
deterministic.

Using percentages reduces the simplicity, ease of verification, and 
speed (division is still slow on most CPU's, and you need division for 
percentages), and is likely to not be as deterministic (both because the 
choice of first devices is harder when all are 100% empty, and because 
of potential rounding errors), and probably won't produce optimal 
layouts quite as reliably (you either need to get into floating-point 
math (which is to be avoided in the kernel whenever possible), or you 
end up with much more quantized disk selection).

I could see an adapted percentage method that preferentially spreads 
across disks whenever possible _possibly_ making sense once we can 
properly parallelize disk access in BTRFS, but until then I see no 
reason to change something that is already working reasonably well.

In your particular case, I'd actually suggest using something under 
BTRFS to merge the smaller disks to get as many devices as close to 8TB 
as possible.  That should help spread load better.

next prev parent reply	other threads:[~2019-02-15 16:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-15 15:40 Better distribution of RAID1 data? Brian B
2019-02-15 15:55 ` Hugo Mills
2019-02-15 16:54 ` Austin S. Hemmelgarn [this message]
2019-02-15 19:50   ` Zygo Blaxell
2019-02-15 19:55     ` Austin S. Hemmelgarn
2019-02-15 23:11       ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91c2c290-5796-3f18-804e-0c19ae17f1db@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=brian@sd85.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).