public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] LVM/RAID/FS integration
@ 2007-06-27 23:09 Kent Overstreet
  2007-06-29  9:39 ` Andi Kleen
  2007-07-09 18:09 ` Phillip Susi
  0 siblings, 2 replies; 3+ messages in thread
From: Kent Overstreet @ 2007-06-27 23:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List

I believe I have the right way of going about it. I agree with the
kernel developers who've stated that ZFS is a layering violation, and
we can do better.

 Consider a filesystem on a set of drives; it may want some data to be
in a raid5 and some to be mirrored. The correct interface for the
filesystem is to have two or more opaque volumes, which it may be able
to resize depending on usage.

Basically, the order we want is fs -> raid -> lvm. Given a set of
identical drives, we want LVM to handle them separately and divide
them up into LVs identically; then corresponding LVs are raided
together. We might have a raid5 volume and a raid1 volume, each of
which can be resized independently. The FS then makes use of both of
them, putting metadata and frequently used data on the raid1 and
everything else on the raid5 (btrfs, as I understand it, will be able
to do this sort of thing eventually).

The trouble is it's completely impractical with current tools; we need
tighter integration between md and LVM. Basically, we need a new type
of VG; you'd only be able to make it out of PVs that are the same size
(or close). Then, when you create an LV you decide what kind of
redundancy you want; the LVM internally creates multiple LVs and raids
them together to make the LV the fs sees.

This could, sort of, be done in userspace, but I think it'll work
better if dm and md are better integrated in the kernel. I thought i'd
throw out my ideas before I get too far in. Thoughts?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] LVM/RAID/FS integration
  2007-06-27 23:09 [RFC] LVM/RAID/FS integration Kent Overstreet
@ 2007-06-29  9:39 ` Andi Kleen
  2007-07-09 18:09 ` Phillip Susi
  1 sibling, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2007-06-29  9:39 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Linux Kernel Mailing List

"Kent Overstreet" <kent.overstreet@gmail.com> writes:
> 
> Basically, the order we want is fs -> raid -> lvm. Given a set of
> identical drives, we want LVM to handle them separately and divide
> them up into LVs identically; then corresponding LVs are raided
> together. We might have a raid5 volume and a raid1 volume, each of
> which can be resized independently. The FS then makes use of both of
> them, putting metadata and frequently used data on the raid1 and
> everything else on the raid5 (btrfs, as I understand it, will be able
> to do this sort of thing eventually).

XFS can do it already using its realtime volume feature. It keeps the
metadata on the original device and puts the data on another device.
The log can be also on another device (but other file systems support
that too)

People often complain about the code size and complexity
of XFS, but it actually implements a lot of functionality
other file systems don't have in there.

Arguably it's all not too easy to configure and a little awkward to
have upto 3 LVs per fs.

And there is no concept of "frequently used data" (which would be
probably hard anyways; how would you move data if it becomes
frequently used) Out of tree XFS actually supports that using DMAPI
and an external manager for automatic swapping of files to tape
robots, but you probably don't want to go this way. DMAPI is pretty
ugly. 

> The trouble is it's completely impractical with current tools; we need
> tighter integration between md and LVM. 

The current direction seems to be to slowly duplicate all MD
functionality in DM, but then there were also some movements
to add a little volume manager to MD. We'll see who wins.
They both move relatively slowly.

> throw out my ideas before I get too far in. Thoughts?

Some nicer high level userland utilities who know how to integrate all
this would be probably a good idea. 

That seems to be the main appeal of ZFS anyways --
simpler user interfaces.

-Andi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] LVM/RAID/FS integration
  2007-06-27 23:09 [RFC] LVM/RAID/FS integration Kent Overstreet
  2007-06-29  9:39 ` Andi Kleen
@ 2007-07-09 18:09 ` Phillip Susi
  1 sibling, 0 replies; 3+ messages in thread
From: Phillip Susi @ 2007-07-09 18:09 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Linux Kernel Mailing List

Kent Overstreet wrote:
> The trouble is it's completely impractical with current tools; we need
> tighter integration between md and LVM. Basically, we need a new type
> of VG; you'd only be able to make it out of PVs that are the same size
> (or close). Then, when you create an LV you decide what kind of
> redundancy you want; the LVM internally creates multiple LVs and raids
> them together to make the LV the fs sees.

In theory LVM is meant to handle the raid bits as well, once the mirror 
target is improved to properly handle error correction, and the raid5 
target stabilizes.  Thus md won't be needed, you will just need an 
interface between the filesystem and lvm.

> This could, sort of, be done in userspace, but I think it'll work
> better if dm and md are better integrated in the kernel. I thought i'd
> throw out my ideas before I get too far in. Thoughts?

This is the kind of thing that should go in userspace.  The filesystem 
should just fire off events that tell a userspace daemon things like 
"the raid5 volume is now at x% use" and let it choose the policy, 
possibly by asking lvm to resize the underlying volume, then sending a 
message back to the fs that the volume has been resized, and it can 
start using the extra space.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-07-09 18:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-27 23:09 [RFC] LVM/RAID/FS integration Kent Overstreet
2007-06-29  9:39 ` Andi Kleen
2007-07-09 18:09 ` Phillip Susi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox