All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>,
	Stephan von Krawczynski <skraw@ithnet.com>,
	Christoph Hellwig <hch@infradead.org>, jim owens <jowens@hp.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Some very basic questions
Date: Wed, 22 Oct 2008 11:02:26 -0400	[thread overview]
Message-ID: <48FF4082.407@redhat.com> (raw)
In-Reply-To: <48FF3EB8.6050306@redhat.com>

Avi Kivity wrote:
> Ric Wheeler wrote:
>>> You want to have spare capacity, enough for one or two (or fifteen) 
>>> drives' worth of data.  When a drive goes bad, you rebuild into the 
>>> spare capacity you have.
>>
>> That is a different model (and one that makes sense, we used that in 
>> Centera for object level protection schemes). It is a nice model as 
>> well, but not how most storage works today.
>
> Well, btrfs is not about duplicating how most storage works today.  
> Spare capacity has significant advantages over spare disks, such as 
> being able to mix disk sizes, RAID levels, and better performance.

Sure, there are advantages that go in favour of one or the other 
approaches. But btrfs is also about being able to use common hardware 
configurations without having to reinvent where we can avoid it (if we 
have a working RAID or enough drives to do RAID5 with spares or RAID6, 
we want to be able to delegate that off to something else if we can).
>
>>>
>>> When you replace the drive, the filesystem moves data into the new 
>>> drive to take advantage of the new spindle.
>>>
>>
>> When you buy a storage solution (hardware or software), the key here 
>> is "utilized capacity." If you have an enclosure that can host say 
>> 12-15 drives in a 2U enclosure, people normally leave one drive as 
>> spare.  RAID6 is another way to do this. You can do a 4+2 and 4+2 
>> with 66% utilized capacity in RAID 6 or possibly a RAID5 scheme using 
>> like 5+1 and 4+1 with one global spare (75% utilized capacity).
>>
>> That gives you the chance to do  rebuild your RAID group without 
>> having to physically visit the data center. You can also do fancy 
>> stuff with the spare (like migrate as many blocks as possible before 
>> the RAID rebuild to that spare) which reduces your exposure to the 
>> 2nd drive failure and speeds up your rebuild time.
>>
>> In the end, whether you use a block based RAID solution or an object 
>> based solution, you just need to figure out how to balance your 
>> utilized capacity against performance and data integrity needs.
>
> In both models (spare disk and spare capacity) the storage utilization 
> is the same, or nearly so.  But with spare capacity you get better 
> performance since you have more spindles seeking for your data, and 
> since less of the disk surface is occupied by data, making your seeks 
> shorter.
>
True, you can get more performance if you use all of the hardware you 
have all of the time.

The major difficulty with the spare capacity model is that your recovery 
is not as simple and well understood as RAID rebuilds. If you assume 
that whole drives fail under btrfs mirroring, you are not really doing 
anything more than simple RAID, or do I misunderstand your suggestion?

I don't see the point about head seeking. In RAID, you also have the 
same layout so you minimize head movement (just move more heads per IO 
in parallel).

ric

  reply	other threads:[~2008-10-22 15:02 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-21 11:23 Some very basic questions Stephan von Krawczynski
2008-10-21 12:13 ` Andi Kleen
2008-10-21 14:22   ` Stephan von Krawczynski
2008-10-21 15:34     ` jim owens
2008-10-22 11:36       ` Stephan von Krawczynski
2008-10-22 12:15         ` Avi Kivity
2008-10-22 13:03           ` Ric Wheeler
2008-10-22 13:13             ` Chris Mason
2008-10-22 13:16             ` Avi Kivity
2008-10-21 13:20 ` jim owens
2008-10-21 17:01   ` Stephan von Krawczynski
2008-10-21 17:15     ` Christoph Hellwig
2008-10-21 17:31       ` Ric Wheeler
2008-10-22 12:27         ` Stephan von Krawczynski
2008-10-22 13:15           ` Chris Mason
2008-10-22 13:27             ` Ric Wheeler
2008-10-22 14:32               ` Avi Kivity
2008-10-22 14:36                 ` Chris Mason
2008-10-22 14:40                   ` Avi Kivity
2008-10-22 14:46                 ` Ric Wheeler
2008-10-22 14:54                   ` Avi Kivity
2008-10-22 15:02                     ` Ric Wheeler [this message]
2008-10-22 15:13                       ` Avi Kivity
2008-10-22 15:25                         ` Ric Wheeler
2008-10-22 15:33                           ` Chris Mason
2008-10-22 15:43                             ` Avi Kivity
2008-10-22 15:54                               ` Ric Wheeler
2008-10-22 18:28                                 ` Avi Kivity
2008-10-22 15:39                           ` Avi Kivity
2008-10-22 13:52             ` Stephan von Krawczynski
2008-10-22 15:56               ` Michel Salim
2008-10-22 16:56                 ` jim owens
2008-10-23  9:47                 ` Stephan von Krawczynski
2008-10-22 11:40       ` Stephan von Krawczynski
2008-10-21 13:59 ` Chris Mason
2008-10-21 16:09   ` Andi Kleen
2008-10-22 11:43     ` Stephan von Krawczynski
2008-10-21 16:27   ` Stephan von Krawczynski
2008-10-21 16:59     ` Andi Kleen
2008-10-22 11:46       ` Stephan von Krawczynski
2008-10-21 17:49     ` Chris Mason
2008-10-22 12:19       ` Stephan von Krawczynski
2008-10-22 12:48         ` Jeff Schroeder
2008-10-22 14:02           ` Stephan von Krawczynski
2008-10-22 13:50         ` Chris Mason
2008-10-22 14:04           ` Matthias Wächter
2008-10-22 14:32             ` Ric Wheeler
2008-10-22 14:44               ` jim owens
2008-10-24  8:42           ` Chris Samuel
2008-10-24  8:39         ` Chris Samuel
2008-10-21 20:54   ` Eric Anopolsky
2008-10-21 22:18     ` Ric Wheeler
2008-10-22  2:29       ` Eric Anopolsky
2008-10-22 10:42         ` Ric Wheeler
2008-10-22 10:53           ` Tejun Heo
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 13:15               ` Tejun Heo
2008-10-22 13:19                 ` Chris Mason
2008-10-22 13:38                   ` Ric Wheeler
2008-10-22 13:59                     ` Chris Mason
2008-10-22 14:23                       ` Ric Wheeler
2008-10-22 13:23                 ` Ric Wheeler
2008-10-22 16:14                   ` Tejun Heo
2008-10-22 16:34                     ` Ric Wheeler
2008-10-23  3:59                       ` Tejun Heo
2008-10-22 18:32                     ` Avi Kivity
2008-10-22 19:13                       ` jim owens
2008-10-22 19:22                         ` Avi Kivity
2008-10-22 19:59                       ` Ric Wheeler
2008-10-22 21:31                     ` Eric Anopolsky
2008-10-22 21:56                       ` Ric Wheeler
  -- strict thread matches above, loose matches on Subject: below --
2008-10-21 17:37 calin
2008-10-21 20:08 ` jim owens
2008-10-22  7:15   ` Avi Kivity
2008-10-22 14:13     ` jim owens
2008-10-22 14:25       ` Avi Kivity
2008-10-22 14:35 dbz
2008-10-27 15:43 ` Stephan von Krawczynski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48FF4082.407@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=avi@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=hch@infradead.org \
    --cc=jowens@hp.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=skraw@ithnet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.