linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Btrfs RAID space utilization and bitrot reconstruction
@ 2012-07-01 11:50 Waxhead
  2012-07-01 12:27 ` Hugo Mills
  2012-07-02 18:00 ` Martin Steigerwald
  0 siblings, 2 replies; 3+ messages in thread
From: Waxhead @ 2012-07-01 11:50 UTC (permalink / raw)
  To: linux-btrfs

As far as I understand btrfs stores all data in huge chunks that are 
striped, mirrored or "raid5/6'ed" throughout all the disks added to the 
filesystem/volume.

How does btrfs deal with different sized disks? let's say that you for 
example have 10 different disks that are 100GB,200GB,300GB...1000GB and 
you create a btrfs filesystem with all the disks. How will the raid5 
implementation distribute chunks in such a setup. I assume the 
stripe+stripe+parity are separate chunks that are placed on separate 
disks but how does btrfs select the best disk to store a chunk on? In 
short will a slow disk slow down the entire "array", parts of it or will 
btrfs attempt to use the fastest disks first?

Also since btrfs checksums both data and metadata I am thinking that at 
least the raid6 implementation perhaps can (try to) reconstruct corrupt 
data (and try to rewrite it) before reading an alternate copy. Can 
someone please fill me in on the details here?

Finaly how does btrfs deals with advanced format (4k sectors) drives 
when the entire drive (and not a partition) is used to build a btrfs 
filesystem. Is proper alignment achieved?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Btrfs RAID space utilization and bitrot reconstruction
  2012-07-01 11:50 Btrfs RAID space utilization and bitrot reconstruction Waxhead
@ 2012-07-01 12:27 ` Hugo Mills
  2012-07-02 18:00 ` Martin Steigerwald
  1 sibling, 0 replies; 3+ messages in thread
From: Hugo Mills @ 2012-07-01 12:27 UTC (permalink / raw)
  To: Waxhead; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

On Sun, Jul 01, 2012 at 01:50:39PM +0200, Waxhead wrote:
> As far as I understand btrfs stores all data in huge chunks that are
> striped, mirrored or "raid5/6'ed" throughout all the disks added to
> the filesystem/volume.

   Well, RAID-5/6 hasn't landed yet, but yes.

> How does btrfs deal with different sized disks? let's say that you
> for example have 10 different disks that are
> 100GB,200GB,300GB...1000GB and you create a btrfs filesystem with
> all the disks. How will the raid5 implementation distribute chunks
> in such a setup.

   We haven't seen the code for that bit yet.

> I assume the stripe+stripe+parity are separate chunks that are
> placed on separate disks but how does btrfs select the best disk to
> store a chunk on? In short will a slow disk slow down the entire
> "array", parts of it or will btrfs attempt to use the fastest disks
> first?

   Chunks are allocated by ordering the devices by the amount of free
(=unallocated) space left on each, and picking the chunks from devices
in that order. For RAID-1 chunks are picked in pairs. For RAID-0, "as
many as possible" are picked, down to a minimum of 2 (I think). For
RAID-10, the largest even number possible is picked, down to a minimum
of 4. I _believe_ that RAID-5 and -6 will pick as many as possible,
down to some minimum -- but as I said, we haven't seen the code yet.

> Also since btrfs checksums both data and metadata I am thinking that
> at least the raid6 implementation perhaps can (try to) reconstruct
> corrupt data (and try to rewrite it) before reading an alternate
> copy. Can someone please fill me in on the details here?

   Yes, it should be possible to do that with RAID-5 as well. (Read
the data stripes, verify checksums, if one fails, read the parity,
verify that, and reconstruct the bad block from the known-good data).

> Finaly how does btrfs deals with advanced format (4k sectors) drives
> when the entire drive (and not a partition) is used to build a btrfs
> filesystem. Is proper alignment achieved?

   I don't know about that. However, the native block size in btrfs is
4k, so I'd imagine that it's all good.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You stay in the theatre because you're afraid of having no ---    
                         money? There's irony...                         

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Btrfs RAID space utilization and bitrot reconstruction
  2012-07-01 11:50 Btrfs RAID space utilization and bitrot reconstruction Waxhead
  2012-07-01 12:27 ` Hugo Mills
@ 2012-07-02 18:00 ` Martin Steigerwald
  1 sibling, 0 replies; 3+ messages in thread
From: Martin Steigerwald @ 2012-07-02 18:00 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Waxhead

Am Sonntag, 1. Juli 2012 schrieb Waxhead:
> As far as I understand btrfs stores all data in huge chunks that are 
> striped, mirrored or "raid5/6'ed" throughout all the disks added to
> the  filesystem/volume.

Not through all disks. At least not with the current RAID-1 
implementation. It stores two copies of a chunk, no matter how many drives 
you use.

Rest see Hugo´s answer.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-07-02 18:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-01 11:50 Btrfs RAID space utilization and bitrot reconstruction Waxhead
2012-07-01 12:27 ` Hugo Mills
2012-07-02 18:00 ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).