* raid0/jbod/lvm, sorta?
@ 2009-12-30 0:03 Matt Garman
[not found] ` <db9fa2350912291628n170ec385qabb8e52f37da9ed0@mail.gmail.com>
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Matt Garman @ 2009-12-30 0:03 UTC (permalink / raw)
To: linux-raid
Is there a way, with Linux md (or maybe lvm) to create a single mass
storage device from many physical drives, but with the property that
if one drive fails, all data isn't lost, AND no redundancy?
I.e., similar to RAID-0, but if one drive dies, all data (but that
on the failed drive) is still readily available?
Motivation:
I currently have a four-disc RAID5 device for media storage. The
typical usage pattern is few writes, many reads, lots of idle time.
I got to thinking, with proper backups, RAID really only buys me
availability or performance, neither of which are a priority.
Modern single-disc speed is more than enough, and high-availability
isn't a requirement for a home media server.
So I have four discs constantly running, using a fair amount of
power. And I need more space, so the power consumption only goes
up.
I experimented a while with letting the drives spindown (hdparm -S),
but (1) it was obnoxious waiting for all four discs to spinup when I
wanted the data (they spunup in series---good for the power supply,
bad for latency); and (2) I felt that having all four discs spinup
was too much wear and tear on the drives, when, in principle, only
one drive needed to spin up.
I got to thinking, I could just have a bunch of individual drives,
let them all spindown, and when data is needed, only spinup the one
drive that has the data I want. Less wear and tear overall, lower
overall power consumption, and lower access latency (compared to the
whole RAID spinup).
I know I could do this manually with symlinks. E.g., have a
directory like /bigstore that contains symlinks into /mnt/drive1,
/mnt/drive2, /mnt/drive3, etc. And then if one drive dies, the
whole store isn't trashed. This seems fairly simple, so I wonder if
there's not some automatic way to do it. Hence, this email. :)
Thanks for any thoughts or suggestions!
Matt
^ permalink raw reply [flat|nested] 10+ messages in thread[parent not found: <db9fa2350912291628n170ec385qabb8e52f37da9ed0@mail.gmail.com>]
* Fwd: raid0/jbod/lvm, sorta? [not found] ` <db9fa2350912291628n170ec385qabb8e52f37da9ed0@mail.gmail.com> @ 2009-12-30 0:29 ` Kevin Maguire 2009-12-30 0:37 ` Roger Heflin 0 siblings, 1 reply; 10+ messages in thread From: Kevin Maguire @ 2009-12-30 0:29 UTC (permalink / raw) To: linux-raid Hi you can certainly do this, or close to it, with multiple OSS devices in Lustre http://en.wikipedia.org/wiki/Lustre_%28file_system%29 For me Lustre was a bit of a b!tch to setup and manage, but YMMV. KM -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: raid0/jbod/lvm, sorta? 2009-12-30 0:29 ` Fwd: " Kevin Maguire @ 2009-12-30 0:37 ` Roger Heflin 0 siblings, 0 replies; 10+ messages in thread From: Roger Heflin @ 2009-12-30 0:37 UTC (permalink / raw) To: Kevin Maguire; +Cc: linux-raid Kevin Maguire wrote: > Hi > > you can certainly do this, or close to it, with multiple OSS devices in Lustre > > http://en.wikipedia.org/wiki/Lustre_%28file_system%29 > > For me Lustre was a bit of a b!tch to setup and manage, but YMMV. > > KM > I bit of a b!tch is a understatement. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: raid0/jbod/lvm, sorta? 2009-12-30 0:03 raid0/jbod/lvm, sorta? Matt Garman [not found] ` <db9fa2350912291628n170ec385qabb8e52f37da9ed0@mail.gmail.com> @ 2009-12-30 0:43 ` Roger Heflin 2009-12-30 6:13 ` Mikael Abrahamsson ` (2 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Roger Heflin @ 2009-12-30 0:43 UTC (permalink / raw) To: Matt Garman; +Cc: linux-raid Matt Garman wrote: > Is there a way, with Linux md (or maybe lvm) to create a single mass > storage device from many physical drives, but with the property that > if one drive fails, all data isn't lost, AND no redundancy? > > I.e., similar to RAID-0, but if one drive dies, all data (but that > on the failed drive) is still readily available? > > Motivation: > > I currently have a four-disc RAID5 device for media storage. The > typical usage pattern is few writes, many reads, lots of idle time. > I got to thinking, with proper backups, RAID really only buys me > availability or performance, neither of which are a priority. > Modern single-disc speed is more than enough, and high-availability > isn't a requirement for a home media server. > > So I have four discs constantly running, using a fair amount of > power. And I need more space, so the power consumption only goes > up. > > I experimented a while with letting the drives spindown (hdparm -S), > but (1) it was obnoxious waiting for all four discs to spinup when I > wanted the data (they spunup in series---good for the power supply, > bad for latency); and (2) I felt that having all four discs spinup > was too much wear and tear on the drives, when, in principle, only > one drive needed to spin up. > > I got to thinking, I could just have a bunch of individual drives, > let them all spindown, and when data is needed, only spinup the one > drive that has the data I want. Less wear and tear overall, lower > overall power consumption, and lower access latency (compared to the > whole RAID spinup). > > I know I could do this manually with symlinks. E.g., have a > directory like /bigstore that contains symlinks into /mnt/drive1, > /mnt/drive2, /mnt/drive3, etc. And then if one drive dies, the > whole store isn't trashed. This seems fairly simple, so I wonder if > there's not some automatic way to do it. Hence, this email. :) > > Thanks for any thoughts or suggestions! > Matt > I was thinking the best way would be to use the raid5 like you are, but have a "cache" drive/device, the cache device has all filesystem entries but only points over to the other data (HSM system), and then when you try to read one of the files it spins things up and copies that file onto cache (when done with the copy the raid array spins down). Same thing would be done on recording ... it writes to the cache device and when either the cache is getting close to full moves off/syncs the new files onto the raid array, and/or moves all of the "new" files off every so often via a cron job. Now, exactly how to write a kernel module or some other service to manage this I am not sure about, but it would in my case allow me to spin down a number of spindles for a large portion of the day...and those things run 5-10w/spindle, so alot of power. If someone has ideas of where to start or some program/kernel modules to start with or knows of something that would already do this, it would seem useful. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: raid0/jbod/lvm, sorta? 2009-12-30 0:03 raid0/jbod/lvm, sorta? Matt Garman [not found] ` <db9fa2350912291628n170ec385qabb8e52f37da9ed0@mail.gmail.com> 2009-12-30 0:43 ` Roger Heflin @ 2009-12-30 6:13 ` Mikael Abrahamsson 2009-12-30 9:31 ` Jan Ceuleers 2009-12-30 7:29 ` Luca Berra [not found] ` <52.81.08392.151EA3B4@cdptpa-omtalb.mail.rr.com> 4 siblings, 1 reply; 10+ messages in thread From: Mikael Abrahamsson @ 2009-12-30 6:13 UTC (permalink / raw) To: Matt Garman; +Cc: linux-raid On Tue, 29 Dec 2009, Matt Garman wrote: > I know I could do this manually with symlinks. E.g., have a directory > like /bigstore that contains symlinks into /mnt/drive1, /mnt/drive2, > /mnt/drive3, etc. And then if one drive dies, the whole store isn't > trashed. This seems fairly simple, so I wonder if there's not some > automatic way to do it. Hence, this email. :) Unionfs will do this for you. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: raid0/jbod/lvm, sorta? 2009-12-30 6:13 ` Mikael Abrahamsson @ 2009-12-30 9:31 ` Jan Ceuleers 2009-12-30 9:36 ` Mikael Abrahamsson 0 siblings, 1 reply; 10+ messages in thread From: Jan Ceuleers @ 2009-12-30 9:31 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: Matt Garman, linux-raid Mikael Abrahamsson wrote: > Unionfs will do this for you. The requirements stated "few writes", not "no writes". Correct me if I'm wrong, but I think that unionfs only allows writes to the highest-precedence branch. So writes to the store would all end up on one disk. Jan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: raid0/jbod/lvm, sorta? 2009-12-30 9:31 ` Jan Ceuleers @ 2009-12-30 9:36 ` Mikael Abrahamsson 0 siblings, 0 replies; 10+ messages in thread From: Mikael Abrahamsson @ 2009-12-30 9:36 UTC (permalink / raw) To: linux-raid On Wed, 30 Dec 2009, Jan Ceuleers wrote: > Correct me if I'm wrong, but I think that unionfs only allows writes to > the highest-precedence branch. So writes to the store would all end up > on one disk. Since he said he would be ok with symlinks, I interpreted that he'd also be ok with moving things around between disks when the disk taking all the writes became full. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: raid0/jbod/lvm, sorta? 2009-12-30 0:03 raid0/jbod/lvm, sorta? Matt Garman ` (2 preceding siblings ...) 2009-12-30 6:13 ` Mikael Abrahamsson @ 2009-12-30 7:29 ` Luca Berra [not found] ` <52.81.08392.151EA3B4@cdptpa-omtalb.mail.rr.com> 4 siblings, 0 replies; 10+ messages in thread From: Luca Berra @ 2009-12-30 7:29 UTC (permalink / raw) To: linux-raid On Tue, Dec 29, 2009 at 06:03:57PM -0600, Matt Garman wrote: >I know I could do this manually with symlinks. E.g., have a >directory like /bigstore that contains symlinks into /mnt/drive1, >/mnt/drive2, /mnt/drive3, etc. And then if one drive dies, the >whole store isn't trashed. This seems fairly simple, so I wonder if >there's not some automatic way to do it. Hence, this email. :) > http://www.filesystems.org/project-unionfs.html maybe your distribution includes it already. L. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <52.81.08392.151EA3B4@cdptpa-omtalb.mail.rr.com>]
* Re: raid0/jbod/lvm, sorta? [not found] ` <52.81.08392.151EA3B4@cdptpa-omtalb.mail.rr.com> @ 2009-12-30 14:15 ` Matt Garman 2009-12-30 18:25 ` Leslie Rhorer 0 siblings, 1 reply; 10+ messages in thread From: Matt Garman @ 2009-12-30 14:15 UTC (permalink / raw) To: Leslie Rhorer, linux-raid On Tue, Dec 29, 2009 at 11:12:42PM -0600, Leslie Rhorer wrote: > > I.e., similar to RAID-0, but if one drive dies, all data (but > > that on the failed drive) is still readily available? > > I can't imagine why anyone would want this. If your data isn't > important - and the fact one is sanguine about losing any random > fraction of it argues this quite strongly - then RAID 0 fits the > bill. If maintaining the data is important, then one needs > redundancy. Well, I do have real backups. So in my vision, I wouldn't really be losing data, just temporarily without it. The point was to minimize data restore in the case of a failure. Say I have a 10 x 1 GB RAID-0 array. That's 10 GB I have to restore in the case of a drive failure. In my scenario, I only have to restore 1 GB. > > I currently have a four-disc RAID5 device for media storage. > > The typical usage pattern is few writes, many reads, lots of > > idle time. I got to thinking, with proper backups, RAID really > > only buys me availability or performance, neither of which are a > > priority. > > RAID 0 provides neither, and is designed only to provide > additional storage capacity. I was under the impression that most people used RAID-0 for the performance benefits? I.e., multiple spindles. > > So I have four discs constantly running, using a fair amount of > > power. And I need more space, so the power consumption only > > goes up. > > Get lower power drives. WD green drives use much less power than > any other drive I have tried, but if you find drives with even > lower consumption, go with them. Of course, SSDs have even lower > power consumption, but are quite expensive. Those are exactly what I have. :) See notes[1] below. > > and (2) I felt that having all four discs spinup > > was too much wear and tear on the drives, when, in principle, only > > one drive needed to spin up. > > This isn't generally going to be true. First of all, the odds are > moderately high the data you seek is going to span multiple > drives, even if it is not striped. In my vision, each file would be strictly written to only one physical device. > Secondly, at a very minimum the superblocks and directory > structures are going to have to be read multiple times. These are > very likely to span 2 or 3 drives or even more. While I'm dreaming, I might as well add that either this information is mirrored across all drives and/or cached in RAM. :) > > I know I could do this manually with symlinks. E.g., have a > > directory like /bigstore that contains symlinks into > > /mnt/drive1, /mnt/drive2, /mnt/drive3, etc. > > Well, that would work for reading the files, but not for much > else. File creations would be almost nightmarish, and file writes > would be fraught with disaster. What happens when the "array" has > plenty of space, but one drive has less than enough to write the > entire file? In general, one cannot know a-priori how much space > a file will take unless it is simply being copied from one place I guess I didn't think about the general file writing case. For me, 99% of all files are put on my array via simple copy. So the exact file size is known in advance. In the general file creation/file writing case, I guess I'd just pick the drive with the most free space, and start writing. If that drive runs out of space, writes simply fail. Although, I can see how this would drive people insane, seeing their file writes fail when, "df" says they have plenty of free space! (Maybe, for the purposes of tools like "df", free space would be equal to the largest amount of free space on a single drive. E.g., if you have 9 drives with 1 GB free, and 1 with 2 GB free, df says you have 2 GB free.) > to another. For that matter, what happens when all 10 drives in > the array have 1G left on them, and one wishes to write a 5G file? You have to buy a new drive, delete files off an existing drive, or maybe even have some fancy "defrag"-like utility that shuffles whole files around the drives. > I think the very limited benefits of what you seek are far > outweighed by the pitfalls and trouble of implementing it. It > might be possible to cache the drive structures somewhere and then > attempt to only spin up the required drives, but a fragmented > semi-array is a really bad idea, if you ask me. Even attempting > the former woud require a much closer association between the file > systems and the underlying arrays than is now the case, and > perhaps moreso than is prudent. Now that you point out the more general use cases of what I was describing, I agree it's definitely not trivial. I wasn't really suggesting someone go off and implement this, as much as seeing if something already existed. I'll probably look into the UnionFS, as many people suggested. Or, for my narrow requirements, I could probably get away with manual management and some simple scripts. I might not even need that, as, e.g., MythTV can be pointed to a root directory and find all files below (at least for pre-existing files). <shrug> [1] Regarding the WD GreenPower drives. I don't get the full benefit of these drives, because the "head parking" feature of those drives doesn't really work for me. I started a discussion on this a while ago, but the gist is: the heads will park/unload, but only briefly. Generally within five minutes, something causes them to unpark. I was unable to track down what caused that. Said discussion was titled "linux disc access when idle", on this mailing list: http://marc.info/?l=linux-raid&m=125078611926294&w=2 Even without the head parking, they are still among the lowest powered drives, although the 5900rpm drives from Seagate and 5400rpm "EcoGreen" from Samsung are similar. This according to SilentPCReview.com, whose results are consistent with my experience. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: raid0/jbod/lvm, sorta? 2009-12-30 14:15 ` Matt Garman @ 2009-12-30 18:25 ` Leslie Rhorer 0 siblings, 0 replies; 10+ messages in thread From: Leslie Rhorer @ 2009-12-30 18:25 UTC (permalink / raw) To: 'Matt Garman', 'linux-raid' > On Tue, Dec 29, 2009 at 11:12:42PM -0600, Leslie Rhorer wrote: > > > I.e., similar to RAID-0, but if one drive dies, all data (but > > > that on the failed drive) is still readily available? > > > > I can't imagine why anyone would want this. If your data isn't > > important - and the fact one is sanguine about losing any random > > fraction of it argues this quite strongly - then RAID 0 fits the > > bill. If maintaining the data is important, then one needs > > redundancy. > > Well, I do have real backups. So in my vision, I wouldn't really be > losing data, just temporarily without it. Surely, and this is always the case (one hopes) when one has valid backups. > The point was to minimize data restore in the case of a failure. > Say I have a 10 x 1 GB RAID-0 array. That's 10 GB I have to restore > in the case of a drive failure. In my scenario, I only have to > restore 1 GB. While I can certainly empathize with the desire to limit one's workload when a failure occurs, I think one should weigh the extra effort required in such an eventuality with the management issues encountered in achieving such a goal. Drive failures are a fairly rare event. The daily hassle of trying to deal with a fragmented drive topology I feel would far exceed the hassle of a very occasional large restore event. > > > I currently have a four-disc RAID5 device for media storage. > > > The typical usage pattern is few writes, many reads, lots of > > > idle time. I got to thinking, with proper backups, RAID really > > > only buys me availability or performance, neither of which are a > > > priority. > > > > RAID 0 provides neither, and is designed only to provide > > additional storage capacity. > > I was under the impression that most people used RAID-0 for the > performance benefits? I.e., multiple spindles. Some do, yes, as that is one benefit. Also, in come cases a group of smaller drives may be less expensive than one larger drive, although these days the largest drive size is also the least expensive per byte. I think most people who employ RAID0, however, do so primarily to allow for a greater volume size. Of course, there is nothing preventing one from having all three benefits in mind, nor of enjoying all three benefits even if two are not a priority. > > > and (2) I felt that having all four discs spinup > > > was too much wear and tear on the drives, when, in principle, only > > > one drive needed to spin up. > > > > This isn't generally going to be true. First of all, the odds are > > moderately high the data you seek is going to span multiple > > drives, even if it is not striped. > > In my vision, each file would be strictly written to only one > physical device. That's pretty inefficient. > > Secondly, at a very minimum the superblocks and directory > > structures are going to have to be read multiple times. These are > > very likely to span 2 or 3 drives or even more. > > While I'm dreaming, I might as well add that either this information > is mirrored across all drives and/or cached in RAM. :) Surely, but again in order to implement such a scenario, the file system layer is going to have to tell the array system layer where to put each mirror, requiring it to have a knowledge of the underlying topology the file system does not normally have. Alternately, the info could be cached in memory, of course, but ether way the file system is going to have to be designed with this specifically in mind. The array ordinarily has no idea what blocks to be written are associated with which file, and the file system may be writing multiple files simultaneously. > > > I know I could do this manually with symlinks. E.g., have a > > > directory like /bigstore that contains symlinks into > > > /mnt/drive1, /mnt/drive2, /mnt/drive3, etc. > > > > Well, that would work for reading the files, but not for much > > else. File creations would be almost nightmarish, and file writes > > would be fraught with disaster. What happens when the "array" has > > plenty of space, but one drive has less than enough to write the > > entire file? In general, one cannot know a-priori how much space > > a file will take unless it is simply being copied from one place > > I guess I didn't think about the general file writing case. For me, > 99% of all files are put on my array via simple copy. So the exact > file size is known in advance. By you, yes, but not necessarily by the file system. I don't think most file systems check beforehand whether the file being written just happens to be a copy and there just happens to be enough room for the copy to finish. Of course, backup utilities often do this very thing, but that is a specialized application. File systems are generally designed to handle the more general cases effectively, rather than targeting efficiency for special cases. That said, you might find one which has tuning available for the more specific cases. Some file systems these days are also staring to take more notice of the underlying topology and the developers of both array utilities and file systems are starting to have their system talk more loquaciously to one another. > In the general file creation/file writing case, I guess I'd just > pick the drive with the most free space, and start writing. If that > drive runs out of space, writes simply fail. See my comment above and the one I posted previously below. > Although, I can see > how this would drive people insane, seeing their file writes fail > when, "df" says they have plenty of free space! (Maybe, for the > purposes of tools like "df", free space would be equal to the > largest amount of free space on a single drive. E.g., if you have > 9 drives with 1 GB free, and 1 with 2 GB free, df says you have 2 GB > free.) See my comments above and below, twice. > > to another. For that matter, what happens when all 10 drives in > > the array have 1G left on them, and one wishes to write a 5G file? > > You have to buy a new drive, delete files off an existing drive, or > maybe even have some fancy "defrag"-like utility that shuffles whole > files around the drives. Have my comments above and below tattooed on your eyelids. Dealing with such issues on a regular basis would be a monumental headache, especially when the failing writes may be autonomous. I suppose you could keep a fairly large "scratch" drive (or small array) for general purpose reads and writes, along with some number of mounted "permanent" drives. You could write a simple script which is run against any files you want to have on the permanent system that checks to see where the file will fit, copies it, and then creates the symlinks in a specialized directory. For example, you could create three directories, one /Permanent, one /Transition, and one /OverSize on your scratch drive / array. Any file you wish to put on a permanent drive you can simply either create in the /Transition directory in the first place, or else move to the /Transition directory when you want to make the file "permanent". Have a cron job run every few minutes or so that checks the /Transition directory for files that are there and not opened by any other process. Have the script determine the drive with the largest free space, make sure it will fit, and then move the file, creating a symlink in /Permanent. If it won't fit, have the script notify the admin via e-mail and move the file over to the /OverSize directory. Note any application which thinks it knows where to find the files won't be able to do so any longer. > > I think the very limited benefits of what you seek are far > > outweighed by the pitfalls and trouble of implementing it. It > > might be possible to cache the drive structures somewhere and then > > attempt to only spin up the required drives, but a fragmented > > semi-array is a really bad idea, if you ask me. Even attempting > > the former woud require a much closer association between the file > > systems and the underlying arrays than is now the case, and > > perhaps moreso than is prudent. > Now that you point out the more general use cases of what I was > describing, I agree it's definitely not trivial. I wasn't really > suggesting someone go off and implement this, as much as seeing if > something already existed. I'll probably look into the UnionFS, as > many people suggested. Or, for my narrow requirements, I could > probably get away with manual management and some simple scripts. I > might not even need that, as, e.g., MythTV can be pointed to a root > directory and find all files below (at least for pre-existing > files). <shrug> > > [1] Regarding the WD GreenPower drives. I don't get the full > benefit of these drives, because the "head parking" feature of > those drives doesn't really work for me. I started a discussion > on this a while ago, but the gist is: the heads will > park/unload, but only briefly. Generally within five minutes, > something causes them to unpark. I was unable to track down > what caused that. > > Said discussion was titled "linux disc access when idle", on > this mailing list: > http://marc.info/?l=linux-raid&m=125078611926294&w=2 > > Even without the head parking, they are still among the lowest > powered drives, although the 5900rpm drives from Seagate and > 5400rpm "EcoGreen" from Samsung are similar. This according to > SilentPCReview.com, whose results are consistent with my > experience. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-12-30 18:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-30 0:03 raid0/jbod/lvm, sorta? Matt Garman
[not found] ` <db9fa2350912291628n170ec385qabb8e52f37da9ed0@mail.gmail.com>
2009-12-30 0:29 ` Fwd: " Kevin Maguire
2009-12-30 0:37 ` Roger Heflin
2009-12-30 0:43 ` Roger Heflin
2009-12-30 6:13 ` Mikael Abrahamsson
2009-12-30 9:31 ` Jan Ceuleers
2009-12-30 9:36 ` Mikael Abrahamsson
2009-12-30 7:29 ` Luca Berra
[not found] ` <52.81.08392.151EA3B4@cdptpa-omtalb.mail.rr.com>
2009-12-30 14:15 ` Matt Garman
2009-12-30 18:25 ` Leslie Rhorer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).