linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugo Mills <hugo@carfax.org.uk>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Actual effect of mkfs.btrfs -m raid10 </dev/sdX> ... -d raid10 </dev/sdX> ...
Date: Wed, 20 Nov 2013 08:09:58 +0000	[thread overview]
Message-ID: <20131120080958.GA28883@carfax.org.uk> (raw)
In-Reply-To: <pan$cd41e$ced598aa$e5d2ed3a$3c71af6f@cox.net>

[-- Attachment #1: Type: text/plain, Size: 4445 bytes --]

On Tue, Nov 19, 2013 at 11:16:58PM +0000, Duncan wrote:
> Hugo Mills posted on Tue, 19 Nov 2013 09:06:02 +0000 as excerpted:
> 
> > This will happen with RAID-10. The allocator will write stripes as wide
> > as it can: in this case, the first stripes will run across all 8
> > devices, until the SSDs are full, and then will write across the
> > remaining 4 devices.
> 
> Hugo, it doesn't change the outcome for this case, but either your 
> assertion above is incorrect, or the wiki discussion is incorrect (of 
> course, or possibly I'm the one misunderstanding something, in which case 
> hopefully replies to this will correct my understanding).
> 
> Because I distinctly recall reading on the wiki that for raid, regardless 
> of the raid level, btrfs always allocates in pairs (well, I guess it'd be 
> pairs of pairs for raid10 mode, and I believe that statement pre-dated 
> raid5/6 support so that isn't included).  I was actually shocked by that 
> because while I knew that was the case for raid1, I had thought that 
> other raid levels would stripe as widely as possible, which is what you 
> assert above as well.

   That's incorrect. I used to think that, a few years ago, and it got
into at least one piece of documentation as a result, but once I
worked out the actual behaviour, I did try to correct it (I definitely
remember fixing the sysadmin guide this way). For striped levels
(RAID-0, 10, 5, 6), the FS will use as many stripes as possible -- for
RAID-10, this means an even number; for the others, this is all the
devices with free space on, down to a RAID-level dependent minimum.

RAID-0:  min 2 devices
RAID-10: min 4 devices
RAID-5:  min 2 devices (I think)
RAID-6:  min 3 devices (I think)

> Now I just have to find where I read that on the wiki...
> 
> OK, here's one spot, FAQ, md-raid/device-mapper-raid/btrfs-raid 
> differences, btrfs:
> 
> https://btrfs.wiki.kernel.org/index.php/FAQ#btrfs
> 
> >>>>
> 
> btrfs combines all the drives into a storage pool first, and then 
> duplicates the chunks as file data is created. RAID-1 is defined 
> currently as "2 copies of all the data on different disks". This differs 
> from MD-RAID and dmraid, in that those make exactly n copies for n disks. 
> In a btrfs RAID-1 on 3 1TB drives we get 1.5TB of usable data. Because 
> each block is only copied to 2 drives, writing a given block only 
> requires exactly 2 drives spin up, reading requires only 1 drive to 
> spinup.

   This is correct.

> RAID-0 is similarly defined, with the stripe split among exactly 2 disks. 
> 3 1TB drives yield 3TB usable space, but to read a given stripe only 
> requires 2 disks.

   This is definitely wrong. RAID-0 will use all 3 drives for each
stripe.

> RAID-10 is built on top of these definitions. Every stripe is split 
> across to exactly 2 RAID1 sets and those RAID1 sets are written to 
> exactly 2 disk (hense 4 disk minimum). A btrfs raid-10 volume with 6 1TB 
> drives will yield 3TB usable space with 2 copies of all data, but only 4

   This is also wrong. You will get 3 TB usage out of 6 × 1 TB drives,
but the individual stripes will be 3 drives wide. You would have the
same behaviour (2 copies of 3 stripes wide) on a 7-device array.

> <<<<
> 
> [Yes, that ending sentence is incomplete in the wiki.]
> 
> So we have:
> 
> 1) raid1 is exactly two copies of data, paired devices.
> 
> 2) raid0 is a stripe exactly two devices wide (reinforced by to read a 
> stripe takes only two devices), so again paired devices.
> 
> 3) raid10 is a combination of the above raid0 and raid1 definitions, 
> exactly two raid1 pairs, paired in raid0.
> 
> So btrfs raid10 is pairs of pairs, each raid0 stripe a pair of raid1 
> mirrors.  If there's 8 devices, four smaller, four larger, the first  
> allocated chunks should be one per device, until the smaller devices fill 
> up it'll chunk across the remaining four, but it'll be pairs of pairs of 
> pairs, two pair(0)-of-pair(1) stripes wide instead of a single quad(0)-of-
> pair(1) stripe wide.

   If the RAID code used pairs for its stripes, that'd be the case,
but it doesn't...

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
               --- emacs: Emacs Makes A Computer Slow. ---               

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  parent reply	other threads:[~2013-11-20  8:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-19  5:12 Actual effect of mkfs.btrfs -m raid10 </dev/sdX> ... -d raid10 </dev/sdX> deadhorseconsulting
2013-11-19  9:06 ` Hugo Mills
2013-11-19 19:24   ` deadhorseconsulting
2013-11-19 21:04     ` Duncan
2013-11-20  6:41     ` Martin
2013-11-19 23:16   ` Duncan
2013-11-20  6:35     ` Martin
2013-11-20 10:16       ` Chris Murphy
2013-11-20 10:22         ` Russell Coker
2013-11-20  8:09     ` Hugo Mills [this message]
2013-11-20 16:43       ` Duncan
2013-11-20 16:52         ` Hugo Mills
2013-11-20 21:13           ` Duncan
2013-11-21 17:14 ` Jeff Mahoney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131120080958.GA28883@carfax.org.uk \
    --to=hugo@carfax.org.uk \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).