* Btrfs RAID space utilization and bitrot reconstruction
@ 2012-07-01 11:50 Waxhead
2012-07-01 12:27 ` Hugo Mills
2012-07-02 18:00 ` Martin Steigerwald
0 siblings, 2 replies; 3+ messages in thread
From: Waxhead @ 2012-07-01 11:50 UTC (permalink / raw)
To: linux-btrfs
As far as I understand btrfs stores all data in huge chunks that are
striped, mirrored or "raid5/6'ed" throughout all the disks added to the
filesystem/volume.
How does btrfs deal with different sized disks? let's say that you for
example have 10 different disks that are 100GB,200GB,300GB...1000GB and
you create a btrfs filesystem with all the disks. How will the raid5
implementation distribute chunks in such a setup. I assume the
stripe+stripe+parity are separate chunks that are placed on separate
disks but how does btrfs select the best disk to store a chunk on? In
short will a slow disk slow down the entire "array", parts of it or will
btrfs attempt to use the fastest disks first?
Also since btrfs checksums both data and metadata I am thinking that at
least the raid6 implementation perhaps can (try to) reconstruct corrupt
data (and try to rewrite it) before reading an alternate copy. Can
someone please fill me in on the details here?
Finaly how does btrfs deals with advanced format (4k sectors) drives
when the entire drive (and not a partition) is used to build a btrfs
filesystem. Is proper alignment achieved?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Btrfs RAID space utilization and bitrot reconstruction
2012-07-01 11:50 Btrfs RAID space utilization and bitrot reconstruction Waxhead
@ 2012-07-01 12:27 ` Hugo Mills
2012-07-02 18:00 ` Martin Steigerwald
1 sibling, 0 replies; 3+ messages in thread
From: Hugo Mills @ 2012-07-01 12:27 UTC (permalink / raw)
To: Waxhead; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]
On Sun, Jul 01, 2012 at 01:50:39PM +0200, Waxhead wrote:
> As far as I understand btrfs stores all data in huge chunks that are
> striped, mirrored or "raid5/6'ed" throughout all the disks added to
> the filesystem/volume.
Well, RAID-5/6 hasn't landed yet, but yes.
> How does btrfs deal with different sized disks? let's say that you
> for example have 10 different disks that are
> 100GB,200GB,300GB...1000GB and you create a btrfs filesystem with
> all the disks. How will the raid5 implementation distribute chunks
> in such a setup.
We haven't seen the code for that bit yet.
> I assume the stripe+stripe+parity are separate chunks that are
> placed on separate disks but how does btrfs select the best disk to
> store a chunk on? In short will a slow disk slow down the entire
> "array", parts of it or will btrfs attempt to use the fastest disks
> first?
Chunks are allocated by ordering the devices by the amount of free
(=unallocated) space left on each, and picking the chunks from devices
in that order. For RAID-1 chunks are picked in pairs. For RAID-0, "as
many as possible" are picked, down to a minimum of 2 (I think). For
RAID-10, the largest even number possible is picked, down to a minimum
of 4. I _believe_ that RAID-5 and -6 will pick as many as possible,
down to some minimum -- but as I said, we haven't seen the code yet.
> Also since btrfs checksums both data and metadata I am thinking that
> at least the raid6 implementation perhaps can (try to) reconstruct
> corrupt data (and try to rewrite it) before reading an alternate
> copy. Can someone please fill me in on the details here?
Yes, it should be possible to do that with RAID-5 as well. (Read
the data stripes, verify checksums, if one fails, read the parity,
verify that, and reconstruct the bad block from the known-good data).
> Finaly how does btrfs deals with advanced format (4k sectors) drives
> when the entire drive (and not a partition) is used to build a btrfs
> filesystem. Is proper alignment achieved?
I don't know about that. However, the native block size in btrfs is
4k, so I'd imagine that it's all good.
Hugo.
--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- You stay in the theatre because you're afraid of having no ---
money? There's irony...
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Btrfs RAID space utilization and bitrot reconstruction
2012-07-01 11:50 Btrfs RAID space utilization and bitrot reconstruction Waxhead
2012-07-01 12:27 ` Hugo Mills
@ 2012-07-02 18:00 ` Martin Steigerwald
1 sibling, 0 replies; 3+ messages in thread
From: Martin Steigerwald @ 2012-07-02 18:00 UTC (permalink / raw)
To: linux-btrfs; +Cc: Waxhead
Am Sonntag, 1. Juli 2012 schrieb Waxhead:
> As far as I understand btrfs stores all data in huge chunks that are
> striped, mirrored or "raid5/6'ed" throughout all the disks added to
> the filesystem/volume.
Not through all disks. At least not with the current RAID-1
implementation. It stores two copies of a chunk, no matter how many drives
you use.
Rest see Hugo´s answer.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-07-02 18:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-01 11:50 Btrfs RAID space utilization and bitrot reconstruction Waxhead
2012-07-01 12:27 ` Hugo Mills
2012-07-02 18:00 ` Martin Steigerwald
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).