* parity data
@ 2008-09-07 5:43 Eric Anopolsky
2008-09-08 14:47 ` Chris Mason
0 siblings, 1 reply; 7+ messages in thread
From: Eric Anopolsky @ 2008-09-07 5:43 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 433 bytes --]
Hi,
A couple of questions.
1. Does btrfs currently have anything like raid 5 or 6?
2. One guy on my LUG's mailing list is really excited about the
potential for setting redundancy on a per-file basis.
I.e. /home/eric/criticalfile gets mirrored across all of the drives in
the filesystem but /home/eric/temporaryfile gets striped. I'm skeptical.
Is it a good idea to allow people/programs to do this?
Cheers,
Eric
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data
2008-09-07 5:43 parity data Eric Anopolsky
@ 2008-09-08 14:47 ` Chris Mason
2008-09-09 0:46 ` Eric Anopolsky
0 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2008-09-08 14:47 UTC (permalink / raw)
To: Eric Anopolsky; +Cc: linux-btrfs
On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote:
> Hi,
>
> A couple of questions.
>
> 1. Does btrfs currently have anything like raid 5 or 6?
>
Not yet, it might one day.
> 2. One guy on my LUG's mailing list is really excited about the
> potential for setting redundancy on a per-file basis.
> I.e. /home/eric/criticalfile gets mirrored across all of the drives in
> the filesystem but /home/eric/temporaryfile gets striped. I'm skeptical.
> Is it a good idea to allow people/programs to do this?
In general, yes. Some files or directories are crucial, and some (swap
for example) don't need to survive a crash.
But, I think the flexibility should go a little further. The goal is to
be able to define drive groups and tie files or directory trees to the
drive groups. That way you can say these files go to the fastest drives
and these files go to some other drive type, etc etc.
-chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data
2008-09-08 14:47 ` Chris Mason
@ 2008-09-09 0:46 ` Eric Anopolsky
2008-09-09 10:43 ` Chris Mason
0 siblings, 1 reply; 7+ messages in thread
From: Eric Anopolsky @ 2008-09-09 0:46 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2460 bytes --]
On Mon, 2008-09-08 at 10:47 -0400, Chris Mason wrote:
> On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote:
> > Hi,
> >
> > A couple of questions.
> >
> > 1. Does btrfs currently have anything like raid 5 or 6?
> >
>
> Not yet, it might one day.
>
> > 2. One guy on my LUG's mailing list is really excited about the
> > potential for setting redundancy on a per-file basis.
> > I.e. /home/eric/criticalfile gets mirrored across all of the drives in
> > the filesystem but /home/eric/temporaryfile gets striped. I'm skeptical.
> > Is it a good idea to allow people/programs to do this?
>
> In general, yes. Some files or directories are crucial, and some (swap
> for example) don't need to survive a crash.
If a disk dies in a redundant configuration, I'd like to be able to hot
replace the failed disk and keep going without any interruption. So
losing parts of the paging file would be pretty bad in that case.
Isn't a partially failed array (where some files are accessible and
others are not, without any additional filesystem damage) a weird
failure mode? Do people know how to deal with this? Do applications know
how to deal with this?
What kind of file would be important enough to keep around but
unimportant enough that it could be lost at any time while the system is
up without anyone knowing or caring?
> But, I think the flexibility should go a little further. The goal is to
> be able to define drive groups and tie files or directory trees to the
> drive groups. That way you can say these files go to the fastest drives
> and these files go to some other drive type, etc etc.
Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've
restricted a performance critical directory to the two fastest drives,
currently totaling 100GB of performance critical data. The rest of the
data on the system is striped.
How much free space do I have on the filesystem? 100GB (the amount of
data I can store in the performance critical directory)? 200GB (the
amount of data I can store outside the performance critical directory if
the striping is guaranteed)? 300GB (the amount of data I can store
outside the performance critical directory if the striping is best
effort)?
I'm open to being convinced otherwise, but I think issues like this
would crop up any time the filesystem is artificially prevented from
load balancing the data across the drives.
Cheers,
Eric
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data
2008-09-09 0:46 ` Eric Anopolsky
@ 2008-09-09 10:43 ` Chris Mason
2008-09-09 14:07 ` Paul P Komkoff Jr
2008-09-10 1:32 ` Eric Anopolsky
0 siblings, 2 replies; 7+ messages in thread
From: Chris Mason @ 2008-09-09 10:43 UTC (permalink / raw)
To: Eric Anopolsky; +Cc: linux-btrfs
On Mon, 2008-09-08 at 18:46 -0600, Eric Anopolsky wrote:
> On Mon, 2008-09-08 at 10:47 -0400, Chris Mason wrote:
> > On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote:
> > > Hi,
> > >
> > > A couple of questions.
> > >
> > > 1. Does btrfs currently have anything like raid 5 or 6?
> > >
> >
> > Not yet, it might one day.
> >
> > > 2. One guy on my LUG's mailing list is really excited about the
> > > potential for setting redundancy on a per-file basis.
> > > I.e. /home/eric/criticalfile gets mirrored across all of the drives in
> > > the filesystem but /home/eric/temporaryfile gets striped. I'm skeptical.
> > > Is it a good idea to allow people/programs to do this?
> >
> > In general, yes. Some files or directories are crucial, and some (swap
> > for example) don't need to survive a crash.
>
> If a disk dies in a redundant configuration, I'd like to be able to hot
> replace the failed disk and keep going without any interruption. So
> losing parts of the paging file would be pretty bad in that case.
>
> Isn't a partially failed array (where some files are accessible and
> others are not, without any additional filesystem damage) a weird
> failure mode? Do people know how to deal with this? Do applications know
> how to deal with this?
>
These configurations are not new. Admins create different filesystems
on different storage all the time. From an admin point of view, one
file on thier box isn't accessible and they want to carry on.
The fact that it is one file in a single FS or one file among dozens of
filesystems doesn't change things.
> What kind of file would be important enough to keep around but
> unimportant enough that it could be lost at any time while the system is
> up without anyone knowing or caring?
>
> > But, I think the flexibility should go a little further. The goal is to
> > be able to define drive groups and tie files or directory trees to the
> > drive groups. That way you can say these files go to the fastest drives
> > and these files go to some other drive type, etc etc.
>
> Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've
> restricted a performance critical directory to the two fastest drives,
> currently totaling 100GB of performance critical data. The rest of the
> data on the system is striped.
>
> How much free space do I have on the filesystem? 100GB (the amount of
> data I can store in the performance critical directory)? 200GB (the
> amount of data I can store outside the performance critical directory if
> the striping is guaranteed)? 300GB (the amount of data I can store
> outside the performance critical directory if the striping is best
> effort)?
>
People already create these configurations, they just do it with
multiple filesystems. And, when they want to resize the performance
critical section, it is a difficult (and often slow) operation.
More flexibility in managing storage is the end goal for btrfs, and
we're just barely getting to the point where we can start addressing
these difficult issues.
-chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data
2008-09-09 10:43 ` Chris Mason
@ 2008-09-09 14:07 ` Paul P Komkoff Jr
2008-09-10 1:32 ` Eric Anopolsky
1 sibling, 0 replies; 7+ messages in thread
From: Paul P Komkoff Jr @ 2008-09-09 14:07 UTC (permalink / raw)
To: Chris Mason; +Cc: Eric Anopolsky, linux-btrfs
Replying to Chris Mason:
I'd like to step into this thread because it's relevant to my
interests.
> > Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've
> > restricted a performance critical directory to the two fastest drives,
> > currently totaling 100GB of performance critical data. The rest of the
> > data on the system is striped.
> >
> > How much free space do I have on the filesystem? 100GB (the amount of
> > data I can store in the performance critical directory)? 200GB (the
> > amount of data I can store outside the performance critical directory if
> > the striping is guaranteed)? 300GB (the amount of data I can store
> > outside the performance critical directory if the striping is best
> > effort)?
> >
>
> People already create these configurations, they just do it with
> multiple filesystems. And, when they want to resize the performance
> critical section, it is a difficult (and often slow) operation.
>
> More flexibility in managing storage is the end goal for btrfs, and
> we're just barely getting to the point where we can start addressing
> these difficult issues.
Some time ago I created some list of features of ideal filesystems.
Currently, btrfs (with all proposed but not implemented yet things) is
very close. For example, the ability to freely manage the media pool,
IOW add and remove harddisks of arbitrary size is very important now,
when it's not very uncommon to have a box with 24 hard drives each of
them can fail at any time, and it's economically unfeasible to keep
spare pool of N drives of exactly the same size.
The individual drive size constraint, which is very important in
traditional layered raid-then-lvm-then-fs approach, is not present in
our ideal case, which allows us to manage our storage more
effectively.
Another point is per-object locality/redundancy policy. It's a killer
feature, because, in a traditional world, you'll have to manage
(resize and move) all those partitions around, which is not very
flexible given that you might have 24 drives and then you'll have to
create one raid10, one raid6, and one raid0 on top of them, juggling
the underlying partition sizes, etc, you know.
It is essential to have a filesystem which will do it for you, again,
with more efficiency that you can extract from 30-year-old way of
setting up "block devices".
--
Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key
This message represents the official view of the voices in my head
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data
2008-09-09 10:43 ` Chris Mason
2008-09-09 14:07 ` Paul P Komkoff Jr
@ 2008-09-10 1:32 ` Eric Anopolsky
2008-09-10 12:59 ` Chris Mason
1 sibling, 1 reply; 7+ messages in thread
From: Eric Anopolsky @ 2008-09-10 1:32 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]
> > Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've
> > restricted a performance critical directory to the two fastest drives,
> > currently totaling 100GB of performance critical data. The rest of the
> > data on the system is striped.
> >
> > How much free space do I have on the filesystem? 100GB (the amount of
> > data I can store in the performance critical directory)? 200GB (the
> > amount of data I can store outside the performance critical directory if
> > the striping is guaranteed)? 300GB (the amount of data I can store
> > outside the performance critical directory if the striping is best
> > effort)?
> >
>
> People already create these configurations, they just do it with
> multiple filesystems. And, when they want to resize the performance
> critical section, it is a difficult (and often slow) operation.
I think I'm starting to get it. btrfs would have drive groups, and no
file would have data on more than one drive group at once. That would
make it possible to make meaningful statements about how much free disk
space there is (per drive group). This is almost the same as having
multiple filesystems, except files cannot be assigned to filesystems on
an individual basis. So in a way, btrfs would be replacing some
functionality of the VFS (mapping files to filesystems).
Is that right?
Cheers,
Eric
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data
2008-09-10 1:32 ` Eric Anopolsky
@ 2008-09-10 12:59 ` Chris Mason
0 siblings, 0 replies; 7+ messages in thread
From: Chris Mason @ 2008-09-10 12:59 UTC (permalink / raw)
To: Eric Anopolsky; +Cc: linux-btrfs
On Tue, 2008-09-09 at 19:32 -0600, Eric Anopolsky wrote:
> > > Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've
> > > restricted a performance critical directory to the two fastest drives,
> > > currently totaling 100GB of performance critical data. The rest of the
> > > data on the system is striped.
> > >
> > > How much free space do I have on the filesystem? 100GB (the amount of
> > > data I can store in the performance critical directory)? 200GB (the
> > > amount of data I can store outside the performance critical directory if
> > > the striping is guaranteed)? 300GB (the amount of data I can store
> > > outside the performance critical directory if the striping is best
> > > effort)?
> > >
> >
> > People already create these configurations, they just do it with
> > multiple filesystems. And, when they want to resize the performance
> > critical section, it is a difficult (and often slow) operation.
>
> I think I'm starting to get it. btrfs would have drive groups, and no
> file would have data on more than one drive group at once. That would
> make it possible to make meaningful statements about how much free disk
> space there is (per drive group). This is almost the same as having
> multiple filesystems, except files cannot be assigned to filesystems on
> an individual basis.
Yes, I think this is a fair statement.
> So in a way, btrfs would be replacing some
> functionality of the VFS (mapping files to filesystems).
I think there are many different definitions of the VFS. Mostly what
the VFS does is maintain the dentry and inode caches, and provide a
basic locking framework around most file/inode operations. The VFS is
still doing all the mapping of files to filesystems, and the filesystem
is mapping files to disk blocks.
-chris
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-09-10 12:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-07 5:43 parity data Eric Anopolsky
2008-09-08 14:47 ` Chris Mason
2008-09-09 0:46 ` Eric Anopolsky
2008-09-09 10:43 ` Chris Mason
2008-09-09 14:07 ` Paul P Komkoff Jr
2008-09-10 1:32 ` Eric Anopolsky
2008-09-10 12:59 ` Chris Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox