From: Hugo Mills <hugo@carfax.org.uk>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Actual effect of mkfs.btrfs -m raid10 </dev/sdX> ... -d raid10 </dev/sdX> ...
Date: Wed, 20 Nov 2013 08:09:58 +0000 [thread overview]
Message-ID: <20131120080958.GA28883@carfax.org.uk> (raw)
In-Reply-To: <pan$cd41e$ced598aa$e5d2ed3a$3c71af6f@cox.net>
[-- Attachment #1: Type: text/plain, Size: 4445 bytes --]
On Tue, Nov 19, 2013 at 11:16:58PM +0000, Duncan wrote:
> Hugo Mills posted on Tue, 19 Nov 2013 09:06:02 +0000 as excerpted:
>
> > This will happen with RAID-10. The allocator will write stripes as wide
> > as it can: in this case, the first stripes will run across all 8
> > devices, until the SSDs are full, and then will write across the
> > remaining 4 devices.
>
> Hugo, it doesn't change the outcome for this case, but either your
> assertion above is incorrect, or the wiki discussion is incorrect (of
> course, or possibly I'm the one misunderstanding something, in which case
> hopefully replies to this will correct my understanding).
>
> Because I distinctly recall reading on the wiki that for raid, regardless
> of the raid level, btrfs always allocates in pairs (well, I guess it'd be
> pairs of pairs for raid10 mode, and I believe that statement pre-dated
> raid5/6 support so that isn't included). I was actually shocked by that
> because while I knew that was the case for raid1, I had thought that
> other raid levels would stripe as widely as possible, which is what you
> assert above as well.
That's incorrect. I used to think that, a few years ago, and it got
into at least one piece of documentation as a result, but once I
worked out the actual behaviour, I did try to correct it (I definitely
remember fixing the sysadmin guide this way). For striped levels
(RAID-0, 10, 5, 6), the FS will use as many stripes as possible -- for
RAID-10, this means an even number; for the others, this is all the
devices with free space on, down to a RAID-level dependent minimum.
RAID-0: min 2 devices
RAID-10: min 4 devices
RAID-5: min 2 devices (I think)
RAID-6: min 3 devices (I think)
> Now I just have to find where I read that on the wiki...
>
> OK, here's one spot, FAQ, md-raid/device-mapper-raid/btrfs-raid
> differences, btrfs:
>
> https://btrfs.wiki.kernel.org/index.php/FAQ#btrfs
>
> >>>>
>
> btrfs combines all the drives into a storage pool first, and then
> duplicates the chunks as file data is created. RAID-1 is defined
> currently as "2 copies of all the data on different disks". This differs
> from MD-RAID and dmraid, in that those make exactly n copies for n disks.
> In a btrfs RAID-1 on 3 1TB drives we get 1.5TB of usable data. Because
> each block is only copied to 2 drives, writing a given block only
> requires exactly 2 drives spin up, reading requires only 1 drive to
> spinup.
This is correct.
> RAID-0 is similarly defined, with the stripe split among exactly 2 disks.
> 3 1TB drives yield 3TB usable space, but to read a given stripe only
> requires 2 disks.
This is definitely wrong. RAID-0 will use all 3 drives for each
stripe.
> RAID-10 is built on top of these definitions. Every stripe is split
> across to exactly 2 RAID1 sets and those RAID1 sets are written to
> exactly 2 disk (hense 4 disk minimum). A btrfs raid-10 volume with 6 1TB
> drives will yield 3TB usable space with 2 copies of all data, but only 4
This is also wrong. You will get 3 TB usage out of 6 × 1 TB drives,
but the individual stripes will be 3 drives wide. You would have the
same behaviour (2 copies of 3 stripes wide) on a 7-device array.
> <<<<
>
> [Yes, that ending sentence is incomplete in the wiki.]
>
> So we have:
>
> 1) raid1 is exactly two copies of data, paired devices.
>
> 2) raid0 is a stripe exactly two devices wide (reinforced by to read a
> stripe takes only two devices), so again paired devices.
>
> 3) raid10 is a combination of the above raid0 and raid1 definitions,
> exactly two raid1 pairs, paired in raid0.
>
> So btrfs raid10 is pairs of pairs, each raid0 stripe a pair of raid1
> mirrors. If there's 8 devices, four smaller, four larger, the first
> allocated chunks should be one per device, until the smaller devices fill
> up it'll chunk across the remaining four, but it'll be pairs of pairs of
> pairs, two pair(0)-of-pair(1) stripes wide instead of a single quad(0)-of-
> pair(1) stripe wide.
If the RAID code used pairs for its stripes, that'd be the case,
but it doesn't...
Hugo.
--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- emacs: Emacs Makes A Computer Slow. ---
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-11-20 8:10 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-19 5:12 Actual effect of mkfs.btrfs -m raid10 </dev/sdX> ... -d raid10 </dev/sdX> deadhorseconsulting
2013-11-19 9:06 ` Hugo Mills
2013-11-19 19:24 ` deadhorseconsulting
2013-11-19 21:04 ` Duncan
2013-11-20 6:41 ` Martin
2013-11-19 23:16 ` Duncan
2013-11-20 6:35 ` Martin
2013-11-20 10:16 ` Chris Murphy
2013-11-20 10:22 ` Russell Coker
2013-11-20 8:09 ` Hugo Mills [this message]
2013-11-20 16:43 ` Duncan
2013-11-20 16:52 ` Hugo Mills
2013-11-20 21:13 ` Duncan
2013-11-21 17:14 ` Jeff Mahoney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131120080958.GA28883@carfax.org.uk \
--to=hugo@carfax.org.uk \
--cc=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).