* parity data @ 2008-09-07 5:43 Eric Anopolsky 2008-09-08 14:47 ` Chris Mason 0 siblings, 1 reply; 7+ messages in thread From: Eric Anopolsky @ 2008-09-07 5:43 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 433 bytes --] Hi, A couple of questions. 1. Does btrfs currently have anything like raid 5 or 6? 2. One guy on my LUG's mailing list is really excited about the potential for setting redundancy on a per-file basis. I.e. /home/eric/criticalfile gets mirrored across all of the drives in the filesystem but /home/eric/temporaryfile gets striped. I'm skeptical. Is it a good idea to allow people/programs to do this? Cheers, Eric [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data 2008-09-07 5:43 parity data Eric Anopolsky @ 2008-09-08 14:47 ` Chris Mason 2008-09-09 0:46 ` Eric Anopolsky 0 siblings, 1 reply; 7+ messages in thread From: Chris Mason @ 2008-09-08 14:47 UTC (permalink / raw) To: Eric Anopolsky; +Cc: linux-btrfs On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote: > Hi, > > A couple of questions. > > 1. Does btrfs currently have anything like raid 5 or 6? > Not yet, it might one day. > 2. One guy on my LUG's mailing list is really excited about the > potential for setting redundancy on a per-file basis. > I.e. /home/eric/criticalfile gets mirrored across all of the drives in > the filesystem but /home/eric/temporaryfile gets striped. I'm skeptical. > Is it a good idea to allow people/programs to do this? In general, yes. Some files or directories are crucial, and some (swap for example) don't need to survive a crash. But, I think the flexibility should go a little further. The goal is to be able to define drive groups and tie files or directory trees to the drive groups. That way you can say these files go to the fastest drives and these files go to some other drive type, etc etc. -chris ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data 2008-09-08 14:47 ` Chris Mason @ 2008-09-09 0:46 ` Eric Anopolsky 2008-09-09 10:43 ` Chris Mason 0 siblings, 1 reply; 7+ messages in thread From: Eric Anopolsky @ 2008-09-09 0:46 UTC (permalink / raw) To: Chris Mason; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2460 bytes --] On Mon, 2008-09-08 at 10:47 -0400, Chris Mason wrote: > On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote: > > Hi, > > > > A couple of questions. > > > > 1. Does btrfs currently have anything like raid 5 or 6? > > > > Not yet, it might one day. > > > 2. One guy on my LUG's mailing list is really excited about the > > potential for setting redundancy on a per-file basis. > > I.e. /home/eric/criticalfile gets mirrored across all of the drives in > > the filesystem but /home/eric/temporaryfile gets striped. I'm skeptical. > > Is it a good idea to allow people/programs to do this? > > In general, yes. Some files or directories are crucial, and some (swap > for example) don't need to survive a crash. If a disk dies in a redundant configuration, I'd like to be able to hot replace the failed disk and keep going without any interruption. So losing parts of the paging file would be pretty bad in that case. Isn't a partially failed array (where some files are accessible and others are not, without any additional filesystem damage) a weird failure mode? Do people know how to deal with this? Do applications know how to deal with this? What kind of file would be important enough to keep around but unimportant enough that it could be lost at any time while the system is up without anyone knowing or caring? > But, I think the flexibility should go a little further. The goal is to > be able to define drive groups and tie files or directory trees to the > drive groups. That way you can say these files go to the fastest drives > and these files go to some other drive type, etc etc. Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've restricted a performance critical directory to the two fastest drives, currently totaling 100GB of performance critical data. The rest of the data on the system is striped. How much free space do I have on the filesystem? 100GB (the amount of data I can store in the performance critical directory)? 200GB (the amount of data I can store outside the performance critical directory if the striping is guaranteed)? 300GB (the amount of data I can store outside the performance critical directory if the striping is best effort)? I'm open to being convinced otherwise, but I think issues like this would crop up any time the filesystem is artificially prevented from load balancing the data across the drives. Cheers, Eric [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data 2008-09-09 0:46 ` Eric Anopolsky @ 2008-09-09 10:43 ` Chris Mason 2008-09-09 14:07 ` Paul P Komkoff Jr 2008-09-10 1:32 ` Eric Anopolsky 0 siblings, 2 replies; 7+ messages in thread From: Chris Mason @ 2008-09-09 10:43 UTC (permalink / raw) To: Eric Anopolsky; +Cc: linux-btrfs On Mon, 2008-09-08 at 18:46 -0600, Eric Anopolsky wrote: > On Mon, 2008-09-08 at 10:47 -0400, Chris Mason wrote: > > On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote: > > > Hi, > > > > > > A couple of questions. > > > > > > 1. Does btrfs currently have anything like raid 5 or 6? > > > > > > > Not yet, it might one day. > > > > > 2. One guy on my LUG's mailing list is really excited about the > > > potential for setting redundancy on a per-file basis. > > > I.e. /home/eric/criticalfile gets mirrored across all of the drives in > > > the filesystem but /home/eric/temporaryfile gets striped. I'm skeptical. > > > Is it a good idea to allow people/programs to do this? > > > > In general, yes. Some files or directories are crucial, and some (swap > > for example) don't need to survive a crash. > > If a disk dies in a redundant configuration, I'd like to be able to hot > replace the failed disk and keep going without any interruption. So > losing parts of the paging file would be pretty bad in that case. > > Isn't a partially failed array (where some files are accessible and > others are not, without any additional filesystem damage) a weird > failure mode? Do people know how to deal with this? Do applications know > how to deal with this? > These configurations are not new. Admins create different filesystems on different storage all the time. From an admin point of view, one file on thier box isn't accessible and they want to carry on. The fact that it is one file in a single FS or one file among dozens of filesystems doesn't change things. > What kind of file would be important enough to keep around but > unimportant enough that it could be lost at any time while the system is > up without anyone knowing or caring? > > > But, I think the flexibility should go a little further. The goal is to > > be able to define drive groups and tie files or directory trees to the > > drive groups. That way you can say these files go to the fastest drives > > and these files go to some other drive type, etc etc. > > Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've > restricted a performance critical directory to the two fastest drives, > currently totaling 100GB of performance critical data. The rest of the > data on the system is striped. > > How much free space do I have on the filesystem? 100GB (the amount of > data I can store in the performance critical directory)? 200GB (the > amount of data I can store outside the performance critical directory if > the striping is guaranteed)? 300GB (the amount of data I can store > outside the performance critical directory if the striping is best > effort)? > People already create these configurations, they just do it with multiple filesystems. And, when they want to resize the performance critical section, it is a difficult (and often slow) operation. More flexibility in managing storage is the end goal for btrfs, and we're just barely getting to the point where we can start addressing these difficult issues. -chris ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data 2008-09-09 10:43 ` Chris Mason @ 2008-09-09 14:07 ` Paul P Komkoff Jr 2008-09-10 1:32 ` Eric Anopolsky 1 sibling, 0 replies; 7+ messages in thread From: Paul P Komkoff Jr @ 2008-09-09 14:07 UTC (permalink / raw) To: Chris Mason; +Cc: Eric Anopolsky, linux-btrfs Replying to Chris Mason: I'd like to step into this thread because it's relevant to my interests. > > Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've > > restricted a performance critical directory to the two fastest drives, > > currently totaling 100GB of performance critical data. The rest of the > > data on the system is striped. > > > > How much free space do I have on the filesystem? 100GB (the amount of > > data I can store in the performance critical directory)? 200GB (the > > amount of data I can store outside the performance critical directory if > > the striping is guaranteed)? 300GB (the amount of data I can store > > outside the performance critical directory if the striping is best > > effort)? > > > > People already create these configurations, they just do it with > multiple filesystems. And, when they want to resize the performance > critical section, it is a difficult (and often slow) operation. > > More flexibility in managing storage is the end goal for btrfs, and > we're just barely getting to the point where we can start addressing > these difficult issues. Some time ago I created some list of features of ideal filesystems. Currently, btrfs (with all proposed but not implemented yet things) is very close. For example, the ability to freely manage the media pool, IOW add and remove harddisks of arbitrary size is very important now, when it's not very uncommon to have a box with 24 hard drives each of them can fail at any time, and it's economically unfeasible to keep spare pool of N drives of exactly the same size. The individual drive size constraint, which is very important in traditional layered raid-then-lvm-then-fs approach, is not present in our ideal case, which allows us to manage our storage more effectively. Another point is per-object locality/redundancy policy. It's a killer feature, because, in a traditional world, you'll have to manage (resize and move) all those partitions around, which is not very flexible given that you might have 24 drives and then you'll have to create one raid10, one raid6, and one raid0 on top of them, juggling the underlying partition sizes, etc, you know. It is essential to have a filesystem which will do it for you, again, with more efficiency that you can extract from 30-year-old way of setting up "block devices". -- Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key This message represents the official view of the voices in my head ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data 2008-09-09 10:43 ` Chris Mason 2008-09-09 14:07 ` Paul P Komkoff Jr @ 2008-09-10 1:32 ` Eric Anopolsky 2008-09-10 12:59 ` Chris Mason 1 sibling, 1 reply; 7+ messages in thread From: Eric Anopolsky @ 2008-09-10 1:32 UTC (permalink / raw) To: Chris Mason; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1387 bytes --] > > Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've > > restricted a performance critical directory to the two fastest drives, > > currently totaling 100GB of performance critical data. The rest of the > > data on the system is striped. > > > > How much free space do I have on the filesystem? 100GB (the amount of > > data I can store in the performance critical directory)? 200GB (the > > amount of data I can store outside the performance critical directory if > > the striping is guaranteed)? 300GB (the amount of data I can store > > outside the performance critical directory if the striping is best > > effort)? > > > > People already create these configurations, they just do it with > multiple filesystems. And, when they want to resize the performance > critical section, it is a difficult (and often slow) operation. I think I'm starting to get it. btrfs would have drive groups, and no file would have data on more than one drive group at once. That would make it possible to make meaningful statements about how much free disk space there is (per drive group). This is almost the same as having multiple filesystems, except files cannot be assigned to filesystems on an individual basis. So in a way, btrfs would be replacing some functionality of the VFS (mapping files to filesystems). Is that right? Cheers, Eric [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: parity data 2008-09-10 1:32 ` Eric Anopolsky @ 2008-09-10 12:59 ` Chris Mason 0 siblings, 0 replies; 7+ messages in thread From: Chris Mason @ 2008-09-10 12:59 UTC (permalink / raw) To: Eric Anopolsky; +Cc: linux-btrfs On Tue, 2008-09-09 at 19:32 -0600, Eric Anopolsky wrote: > > > Let's say I have 4 100GB drives (2 fast ones and 2 slow ones). I've > > > restricted a performance critical directory to the two fastest drives, > > > currently totaling 100GB of performance critical data. The rest of the > > > data on the system is striped. > > > > > > How much free space do I have on the filesystem? 100GB (the amount of > > > data I can store in the performance critical directory)? 200GB (the > > > amount of data I can store outside the performance critical directory if > > > the striping is guaranteed)? 300GB (the amount of data I can store > > > outside the performance critical directory if the striping is best > > > effort)? > > > > > > > People already create these configurations, they just do it with > > multiple filesystems. And, when they want to resize the performance > > critical section, it is a difficult (and often slow) operation. > > I think I'm starting to get it. btrfs would have drive groups, and no > file would have data on more than one drive group at once. That would > make it possible to make meaningful statements about how much free disk > space there is (per drive group). This is almost the same as having > multiple filesystems, except files cannot be assigned to filesystems on > an individual basis. Yes, I think this is a fair statement. > So in a way, btrfs would be replacing some > functionality of the VFS (mapping files to filesystems). I think there are many different definitions of the VFS. Mostly what the VFS does is maintain the dentry and inode caches, and provide a basic locking framework around most file/inode operations. The VFS is still doing all the mapping of files to filesystems, and the filesystem is mapping files to disk blocks. -chris ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-09-10 12:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-07 5:43 parity data Eric Anopolsky 2008-09-08 14:47 ` Chris Mason 2008-09-09 0:46 ` Eric Anopolsky 2008-09-09 10:43 ` Chris Mason 2008-09-09 14:07 ` Paul P Komkoff Jr 2008-09-10 1:32 ` Eric Anopolsky 2008-09-10 12:59 ` Chris Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox