All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@argo.co.il>
To: Neil Brown <neilb@suse.de>
Cc: david@lang.hm, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: limits on raid
Date: Sat, 16 Jun 2007 20:23:01 +0300	[thread overview]
Message-ID: <46741C75.3000003@argo.co.il> (raw)
In-Reply-To: <18035.3009.568832.785308@notabene.brown>

Neil Brown wrote:
>>>   
>>>       
>> Some things are not achievable with block-level raid.  For example, with
>> redundancy integrated into the filesystem, you can have three copies for
>> metadata, two copies for small files, and parity blocks for large files,
>> effectively using different raid levels for different types of data on
>> the same filesystem.
>>     
>
> Absolutely.  And doing that is a very good idea quite independent of
> underlying RAID.  Even ext2 stores multiple copies of the superblock.
>
> Having the filesystem duplicate data, store checksums, and be able to
> find a different copy if the first one it chose was bad is very
> sensible and cannot be done by just putting the filesystem on RAID.
>   

It would need to know a lot about the RAID geometry in order not to put
the the copies on the same disks.

> Having the filesystem keep multiple copies of each data block so that
> when one drive dies, another block is used does not excite me quite so
> much.  If you are going to do that, then you want to be able to
> reconstruct the data that should be on a failed drive onto a new
> drive.
> For a RAID system, that reconstruction can go at the full speed of the
> drive subsystem - but needs to copy every block, whether used or not.
> For in-filesystem duplication, it is easy to imagine that being quite
> slow and complex.  It would depend a lot on how you arrange data,
> and maybe there is some clever approach to data layout that I haven't
> thought of.  But I think that sort of thing is much easier to do in a
> RAID layer below the filesystem.
>   

You'd need a reverse mapping of extents to files.  While maintaining
that is expensive, it brings a lot of benefits:

- rebuild a failed drive, without rebuilding free space
- evacuate a drive in anticipation of taking it offline
- efficient defragmentation

Reverse mapping storage could serve as free space store too.

> Combining these thoughts, it would make a lot of sense for the
> filesystem to be able to say to the block device "That blocks looks
> wrong - can you find me another copy to try?".  That is an example of
> the sort of closer integration between filesystem and RAID that would
> make sense.
>   

It's a step forward, but still quite limited compared to combining the
two layers together.  Sticking with the example above, you still can't
have a mix of parity-protected files and mirror-protected files; the
RAID decides that for you.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


  reply	other threads:[~2007-06-16 17:23 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-15  2:58 limits on raid david
2007-06-15  3:05 ` Neil Brown
2007-06-15  3:43   ` david
2007-06-15  3:58     ` Neil Brown
2007-06-15  9:13       ` David Chinner
2007-06-15 22:21         ` Neil Brown
2007-06-15 11:10       ` Avi Kivity
2007-06-15 16:23         ` Jan Engelhardt
2007-06-15 17:20           ` Avi Kivity
2007-06-15 21:59         ` Neil Brown
2007-06-16 17:23           ` Avi Kivity [this message]
2007-06-17 13:00           ` Andi Kleen
2007-06-18  4:57           ` David Chinner
2007-06-21  2:56             ` Neil Brown
2007-06-21  6:39               ` David Chinner
2007-06-21  6:45                 ` david
2007-06-21  8:59                   ` David Greaves
2007-06-21 17:00                   ` Mark Lord
2007-06-21 11:00                 ` David Chinner
2007-06-21 12:40               ` Mattias Wadenstein
2007-06-21 14:40                 ` Justin Piszcz
2007-06-21 16:48                 ` david
2007-06-21 18:30                 ` Martin K. Petersen
2007-06-21 20:08               ` Nix
2007-06-16  2:03       ` Wakko Warner
2007-06-16  3:47         ` Neil Brown
2007-06-16  4:40           ` Dan Merillat
2007-06-16  7:48           ` david
2007-06-16 13:38             ` David Greaves
2007-06-16 17:16               ` david
2007-06-17 17:16             ` Bill Davidsen
2007-06-18 17:20             ` Brendan Conoboy
2007-06-18 17:28               ` david
2007-06-18 18:03                 ` Lennart Sorensen
2007-06-18 18:12                   ` david
2007-06-18 18:33                     ` Lennart Sorensen
2007-06-18 18:40                       ` david
2007-06-18 19:11                         ` Brendan Conoboy
2007-06-18 20:52                           ` david
2007-06-18 21:46                             ` Wakko Warner
2007-06-18 21:56                               ` david
2007-06-18 22:00                                 ` Brendan Conoboy
2007-06-19 20:11                                 ` Lennart Sorensen
2007-06-19 20:51                                   ` david
2007-06-19 15:07                             ` Phillip Susi
2007-06-19 19:28                               ` david
2007-06-18 18:07                 ` Brendan Conoboy
2007-06-18 18:16                   ` david
2007-06-16 13:33           ` David Greaves
2007-06-17  1:44             ` dean gaudet
2007-06-21  3:01             ` Neil Brown
2007-06-21  8:49               ` David Greaves
2007-06-16 14:08           ` Wakko Warner
2007-06-17  1:47             ` dean gaudet
2007-06-17 13:28               ` Wakko Warner
2007-06-17 17:28                 ` dean gaudet
2007-06-17 19:30                   ` Wakko Warner
2007-06-17 19:54                     ` dean gaudet
2007-06-17 20:46                       ` david
2007-06-17 20:44                     ` david
2007-06-17 17:14       ` Bill Davidsen
2007-06-21 23:03         ` Bill Davidsen
2007-06-22  2:24           ` Neil Brown
2007-06-22  8:10             ` David Greaves
2007-06-22  9:51               ` david
2007-06-22 12:39                 ` David Greaves
2007-06-22 16:00                   ` Bill Davidsen
2007-06-22 16:55                     ` David Greaves
2007-06-22 18:41                     ` david

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46741C75.3000003@argo.co.il \
    --to=avi@argo.co.il \
    --cc=david@lang.hm \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.