petabyte class archival filestore wanted/proposed

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* petabyte class archival filestore wanted/proposed
@ 2006-06-22 16:43 Jeff Anderson-Lee
  2006-06-22 18:19 ` Bryan Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jeff Anderson-Lee @ 2006-06-22 16:43 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel

I'm part of a project at University of California Berkeley that is 
trying to put together a predominantly archival file system for petabyte 
class data stores using Linux with clusters of commodity server 
hardware.  We currently have multiple terabytes of hardware on top of 
which we intend to build such a system.  However, our hope is that the 
end system would be useful for a wide range of users from someone with 3 
large disk or three disk servers to groups with 3 or more distributed 
storage sites.

Main Goals/Features:
    1) Tapeless: maintain multiple copies on disk (minimize 
backup/restore lag)
    2) "Mirroring" across remote sites: for disaster recovery (we sit on 
top of  the Hayward Fault)
    3) Persistent snapshots: as archival copies instead of 
backup/restore scanning
    4) Copy-On-Write: in support of snapshots/archives
    5) Append-mostly log structured file system: make synchronization of 
remote mirrors easier (tail the log).
    6) Avoid (insofar as possible) single point of failure and 
bottlenecks (for scalability)

I've looked into the existing file systems I know about, and none of 
them seem to fit the bill.

Parts of the Open Solaris ZFS file system looks interesting, except (a) 
it is not on Linux and (b) seems to mix together too many levels (volume 
manager and file system).  I can see how using some of the concepts and 
implementing something like it on top of an append-mostly distributed 
logical device might work however.  By splitting the project into two 
parts ((a) a robust, distributed logical block device and (b) a flexible 
file system with snapshots)  it might make it easier to design and build.

Before we begin however, it is important to find out:
    1) Is there anything sufficiently like this to either (a) use 
instead, or (b) start from.
    2) Is there community support for insertion in the main kernel tree 
(without which it is just another toy project)?
    3) Anyone care to join in (a) design, (b) implementation, or (c) 
testing?

I have been contemplating this for some time and do have some ideas that 
I would be happy to share with any and all interested.

Jeff Anderson-Lee
Petabyte Storage Infrastructure Project
University of California at Berkeley

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: petabyte class archival filestore wanted/proposed
  2006-06-22 16:43 petabyte class archival filestore wanted/proposed Jeff Anderson-Lee
@ 2006-06-22 18:19 ` Bryan Henderson
  2006-06-22 18:58   ` Jeff Anderson-Lee
  2006-06-22 19:53 ` Jeff Garzik
  2006-06-23  4:26 ` Andreas Dilger
  2 siblings, 1 reply; 7+ messages in thread
From: Bryan Henderson @ 2006-06-22 18:19 UTC (permalink / raw)
  To: Jeff Anderson-Lee; +Cc: linux-fsdevel

>    1) Tapeless: maintain multiple copies on disk (minimize 
>backup/restore lag)

Can you really call it archival if you're willing to pay 5 times as much 
for quick access?  Maybe you need a different word.  Archive means large 
quantities of data with very low access frequency.  And sometimes, in the 
current legal climate, with very low chance of destruction.

You word this as if the only potential use of tape is backup of disk-based 
data, but it's also pretty useful as the primary copy of archival data.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: petabyte class archival filestore wanted/proposed
  2006-06-22 18:19 ` Bryan Henderson
@ 2006-06-22 18:58   ` Jeff Anderson-Lee
  2006-06-23  0:57     ` Bryan Henderson
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Anderson-Lee @ 2006-06-22 18:58 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel

Bryan Henderson wrote:

>>   1) Tapeless: maintain multiple copies on disk (minimize 
>>backup/restore lag)
>>    
>>
>Can you really call it archival if you're willing to pay 5 times as much 
>for quick access?  Maybe you need a different word.  Archive means large 
>quantities of data with very low access frequency.  And sometimes, in the 
>current legal climate, with very low chance of destruction.
>
>You word this as if the only potential use of tape is backup of disk-based 
>data, but it's also pretty useful as the primary copy of archival data.
>  
>
I know some people swear by them, but our experience with tertiary (tape 
and optical) storge systems has never been positive.  (We have tried 
several over the years, from several vendors.)  Leave it at that, and 
let's just say we want to explore new territory.

There is also an argument that the cost of tape and disk is slowly 
converging/crossing.  Some disagree, we find it an interesting point.

For many users, the cost of archival storage is often dominated by 
non-hardware costs.  Our internal departmental recharge rates for (tape) 
backed-up storage are on the order of $5/month to $10/month per GIGABYTE 
of storage.   That's $60/GB/year to $120/GB/year.  Very little of that 
cost is hardware.   Considering that a GB of disk now costs $1 to $2 for 
commodity disks, I can afford to keep several copies of my data online 
for quick access when I do want it, especially when it is mostly 
archival and doesn't change that often (almost never).

Jeff Anderson-Lee
Petabyte Storage Infrastructure Project
University of California Berkeley

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: petabyte class archival filestore wanted/proposed
  2006-06-22 16:43 petabyte class archival filestore wanted/proposed Jeff Anderson-Lee
  2006-06-22 18:19 ` Bryan Henderson
@ 2006-06-22 19:53 ` Jeff Garzik
  2006-06-22 20:29   ` Jeff Anderson-Lee
  2006-06-23  4:26 ` Andreas Dilger
  2 siblings, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2006-06-22 19:53 UTC (permalink / raw)
  To: Jeff Anderson-Lee; +Cc: linux-fsdevel, linux-kernel

Jeff Anderson-Lee wrote:
> I'm part of a project at University of California Berkeley that is 
> trying to put together a predominantly archival file system for petabyte 
> class data stores using Linux with clusters of commodity server 
> hardware.  We currently have multiple terabytes of hardware on top of 
> which we intend to build such a system.  However, our hope is that the 
> end system would be useful for a wide range of users from someone with 3 
> large disk or three disk servers to groups with 3 or more distributed 
> storage sites.
> 
> Main Goals/Features:
>    1) Tapeless: maintain multiple copies on disk (minimize 
> backup/restore lag)
>    2) "Mirroring" across remote sites: for disaster recovery (we sit on 
> top of  the Hayward Fault)
>    3) Persistent snapshots: as archival copies instead of backup/restore 
> scanning
>    4) Copy-On-Write: in support of snapshots/archives
>    5) Append-mostly log structured file system: make synchronization of 
> remote mirrors easier (tail the log).
>    6) Avoid (insofar as possible) single point of failure and 
> bottlenecks (for scalability)
> 
> I've looked into the existing file systems I know about, and none of 
> them seem to fit the bill.
> 
> Parts of the Open Solaris ZFS file system looks interesting, except (a) 
> it is not on Linux and (b) seems to mix together too many levels (volume 
> manager and file system).  I can see how using some of the concepts and 
> implementing something like it on top of an append-mostly distributed 
> logical device might work however.  By splitting the project into two 
> parts ((a) a robust, distributed logical block device and (b) a flexible 
> file system with snapshots)  it might make it easier to design and build.
> 
> Before we begin however, it is important to find out:
>    1) Is there anything sufficiently like this to either (a) use 
> instead, or (b) start from.
>    2) Is there community support for insertion in the main kernel tree 
> (without which it is just another toy project)?
>    3) Anyone care to join in (a) design, (b) implementation, or (c) 
> testing?

I would recommend checking out Venti:
http://cm.bell-labs.com/sys/doc/venti.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: petabyte class archival filestore wanted/proposed
  2006-06-22 19:53 ` Jeff Garzik
@ 2006-06-22 20:29   ` Jeff Anderson-Lee
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Anderson-Lee @ 2006-06-22 20:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-fsdevel, linux-kernel

Jeff Garzik wrote:

> Jeff Anderson-Lee wrote:
>
>> I'm part of a project at University of California Berkeley that is 
>> trying to put together a predominantly archival file system for 
>> petabyte class data stores using Linux with clusters of commodity 
>> server hardware.  We currently have multiple terabytes of hardware on 
>> top of which we intend to build such a system.  However, our hope is 
>> that the end system would be useful for a wide range of users from 
>> someone with 3 large disk or three disk servers to groups with 3 or 
>> more distributed storage sites.
>>
>> Main Goals/Features:
>>    1) Tapeless: maintain multiple copies on disk (minimize 
>> backup/restore lag)
>>    2) "Mirroring" across remote sites: for disaster recovery (we sit 
>> on top of  the Hayward Fault)
>>    3) Persistent snapshots: as archival copies instead of 
>> backup/restore scanning
>>    4) Copy-On-Write: in support of snapshots/archives
>>    5) Append-mostly log structured file system: make synchronization 
>> of remote mirrors easier (tail the log).
>>    6) Avoid (insofar as possible) single point of failure and 
>> bottlenecks (for scalability)
>>
>> I've looked into the existing file systems I know about, and none of 
>> them seem to fit the bill.
>>
>> Parts of the Open Solaris ZFS file system looks interesting, except 
>> (a) it is not on Linux and (b) seems to mix together too many levels 
>> (volume manager and file system).  I can see how using some of the 
>> concepts and implementing something like it on top of an 
>> append-mostly distributed logical device might work however.  By 
>> splitting the project into two parts ((a) a robust, distributed 
>> logical block device and (b) a flexible file system with snapshots)  
>> it might make it easier to design and build.
>>
>> Before we begin however, it is important to find out:
>>    1) Is there anything sufficiently like this to either (a) use 
>> instead, or (b) start from.
>>    2) Is there community support for insertion in the main kernel 
>> tree (without which it is just another toy project)?
>>    3) Anyone care to join in (a) design, (b) implementation, or (c) 
>> testing?
>
>
> I would recommend checking out Venti:
> http://cm.bell-labs.com/sys/doc/venti.html 

Yes, I've seen that and like some of the ideas.  There is no GPL Linux 
implementation of Venti that I know of.

Jeff Anderson-Lee


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: petabyte class archival filestore wanted/proposed
  2006-06-22 18:58   ` Jeff Anderson-Lee
@ 2006-06-23  0:57     ` Bryan Henderson
  0 siblings, 0 replies; 7+ messages in thread
From: Bryan Henderson @ 2006-06-23  0:57 UTC (permalink / raw)
  To: Jeff Anderson-Lee; +Cc: linux-fsdevel

>For many users, the cost of archival storage is often dominated by 
>non-hardware costs.  Our internal departmental recharge rates for (tape) 
>backed-up storage are on the order of $5/month to $10/month per GIGABYTE 
>of storage.   That's $60/GB/year to $120/GB/year.  Very little of that 
>cost is hardware.   Considering that a GB of disk now costs $1 to $2 for 
>commodity disks, I can afford to keep several copies of my data online 
>for quick access when I do want it, especially when it is mostly 
>archival and doesn't change that often (almost never).

You seem to be mixing apples and oranges -- looking on the one hand at the 
total cost of storage service and on the other at the cost of a disk drive 
on a shelf.  At $1 per gigabyte, the disk drive on a shelf costs 
$.005/GB/month, whereas when that drive is used to provide storage 
service, the service costs at least $5/month.  Unless $4.995 is for the 
backup (that you don't need when the disk _is_ the backup), I don't see 
these numbers saying anything about the cost of disk vs tape.  My guess is 
that no more than $2 of that $5 is for backup service.

BTW, it costs IBM around $20/GB/month for internal storage service, and 
it's been pretty much unchanged for the last 10 years.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: petabyte class archival filestore wanted/proposed
  2006-06-22 16:43 petabyte class archival filestore wanted/proposed Jeff Anderson-Lee
  2006-06-22 18:19 ` Bryan Henderson
  2006-06-22 19:53 ` Jeff Garzik
@ 2006-06-23  4:26 ` Andreas Dilger
  2 siblings, 0 replies; 7+ messages in thread
From: Andreas Dilger @ 2006-06-23  4:26 UTC (permalink / raw)
  To: Jeff Anderson-Lee; +Cc: linux-fsdevel, linux-kernel, pbojanic, jeff

On Jun 22, 2006  09:43 -0700, Jeff Anderson-Lee wrote:
> I'm part of a project at University of California Berkeley that is 
> trying to put together a predominantly archival file system for petabyte 
> class data stores using Linux with clusters of commodity server 
> hardware.  We currently have multiple terabytes of hardware on top of 
> which we intend to build such a system.  However, our hope is that the 
> end system would be useful for a wide range of users from someone with 3 
> large disk or three disk servers to groups with 3 or more distributed 
> storage sites.
> 
> Main Goals/Features:
>    1) Tapeless: maintain multiple copies on disk (minimize 
> backup/restore lag)
>    2) "Mirroring" across remote sites: for disaster recovery (we sit on 
> top of  the Hayward Fault)
>    3) Persistent snapshots: as archival copies instead of 
> backup/restore scanning
>    4) Copy-On-Write: in support of snapshots/archives
>    5) Append-mostly log structured file system: make synchronization of 
> remote mirrors easier (tail the log).
>    6) Avoid (insofar as possible) single point of failure and 
> bottlenecks (for scalability)
> 
> I've looked into the existing file systems I know about, and none of 
> them seem to fit the bill.


> Before we begin however, it is important to find out:
>    1) Is there anything sufficiently like this to either (a) use 
> instead, or (b) start from.
>    2) Is there community support for insertion in the main kernel tree 
> (without which it is just another toy project)?
>    3) Anyone care to join in (a) design, (b) implementation, or (c) 
> testing?

Lustre isn't quite where you want to be yet, but features like mirroring
(closer), snapshots, and disconnected operation+resync (further out) are
all on our roadmap.

Lustre is GPL.  If you are interested to contribute to it we are happy
to work with you.  There is a "lustre.org" (non-CFS Lustre development)
planning session next week in Boulder, CO (and telecon) that you could
join in if you are interested.  Please email pbojanic@clusterfs.com if
you are interested.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-06-23  4:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-22 16:43 petabyte class archival filestore wanted/proposed Jeff Anderson-Lee
2006-06-22 18:19 ` Bryan Henderson
2006-06-22 18:58   ` Jeff Anderson-Lee
2006-06-23  0:57     ` Bryan Henderson
2006-06-22 19:53 ` Jeff Garzik
2006-06-22 20:29   ` Jeff Anderson-Lee
2006-06-23  4:26 ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).