All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] OCFS2 features RFC
@ 2006-04-25 18:35 Mark Fasheh
  2006-04-25 21:55 ` Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Mark Fasheh @ 2006-04-25 18:35 UTC (permalink / raw)
  To: ocfs2-devel

The OCFS2 team is in the preliminary stages of planning major features for
our next cycle of development. The goal of this e-mail then is to stimulate
some discussion as to how features should be prioritized going forward. Some
disclaimers apply:

* The following list is very preliminary and is sure to change.

* I've probably missed some things.

* Development priorities within Oracle can be influenced but are ultimately
  up to management. That's not stopping anyone from contributing though, and
  patches are always welcome.

So I'll start with changes that can be completely contained within the file
system (no cluster stack changes needed):

-Sparse file support: Self explanatory. We need this for various reasons
 including performance, correctness and space usage.

-Htree support

-Extended attributes: This might be another area where we
 steal^H^H^H^H^Hcopy some good code from Ext3 :) On top of this one can
 trivially implement posix acls. We're not likely to support EA block
 sharing though as it becomes difficult to manage across the cluster.

-Removal of the vote mechanism: The most trivial dentry type network votes
 can go quite easily by replacing them with a cluster lock. This is critical
 in speeding up unlink and rename operations in the cluster. The remaining
 votes (mount, unmount, delete_inode) look like they'll require cluster
 stack adjustments.

-Data in inode blocks: Should speed up local node data operations with small
 files significantly.

-Shared writeable mmap: This looks like it might require changes to the
 kernel (outside of OCFS2). We need to investigate further...

Now on to file system features which require cluster stack changes. I'll
have alot more to say about the cluster stack in a bit, but it's worth
listing these out here for completeness.

-Cluster consistent Flock / Lockf

-Online file system resize

-Removal of remaining FS votes: If we can get rid of the delete_inode vote,
 I don't believe we'll need the mount / umount ones anymore (and if we still
 do, then a proper group services could handle that)

-Allow the file system to go "hard read only" when it loses it's connection
 to the disk, rather than the kernel panic we have today. This allows
 applications using the file system to gracefully shut down. Other
 applications on the system continue unharmed. "Hard read only" in the OCFS2
 context means that the RO node does not look mounted to the other nodes on
 that file system. Absolutely no disk writes are allowed.  File data and
 meta data can be stale or otherwise invalid. We never want to return
 invalid data to userspace, so file reads return -EIO.

As far as the existing cluster stack goes, currently most of the OCFS2 team
feels that the code has gone as far as it can and should go. It would
therefore be prudent to allow pluggable cluster stacks. Jeff Mahoney at
Novell has already done some integration work implementing a userspace
clustering interface. We probably want to do more in that area though.

There are several good reasons why we might want to integrate with external
cluster stacks. The most obvious is code reuse. The list of cluster stack
features we require for our next phase of development is very large (some
are listed below). There is no reason to implement those features unless
we're certain existing software doesn't provide them and can't be extended.
This will also allow a greater amount of choice for the end user. What stack
works well for one environment might not work as well for another. There's
also the fact that current resources are limited. It's enough work designing
and implementing a file system. If we can get out of the business of
maintaining a cluster stack, we should do so.

So the question then becomes, "What is it that we require of our cluster
stack going forward?"

- We'd like as much of it to be user space code as is possible and
  practical.

- The node manager should support dynamic cluster topology updates,
  including removing nodes from the cluster, propagating new configurations to
  existing nodes, etc.

- A pluggable fencing mechanism is a priority.

- We'd like some group services implementation to handle things like
  membership of a mount point, dlm domain/lockspace, etc.

- On the DLM side, we'd like things like directory based mastery, a range
  locking API, and some extra LVB recovery bits.

So that's it for now. Hopefully this will spurn some interesting discussion.
Please keep in mind that any of this is subject to change - cluster stack
requirements especially are things we've only recently begun discussing.
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com

^ permalink raw reply	[flat|nested] 38+ messages in thread
* [Ocfs2-devel] OCFS2 Features RFC
@ 2006-05-02 18:22 Brian Long
  2006-05-02 20:29 ` Sunil Mushran
  0 siblings, 1 reply; 38+ messages in thread
From: Brian Long @ 2006-05-02 18:22 UTC (permalink / raw)
  To: ocfs2-devel

Hello,

I just subscribed to this list because I saw this posting in the
archives:
http://oss.oracle.com/pipermail/ocfs2-devel/2006-April/000931.html

Is there any reason you wouldn't ask the ocfs2-users community for
feedback on features as well?  I hadn't subscribed to -devel because I
figured it was solely for folks actually developing the OCFS2 code  :)

In my opinion, the proposed feature about "hard read only" is the most-
wanted.  My team is in the middle of testing 10gR2 RAC on OCFS2 for
production deployments on RHEL 4 (hopefully your x86_64 certification is
coming soon).  I assume Oracle RAC would like the "hard read only" more
than the current panic.

Also, while I saw one end user complain about your ideas of implementing
ext3 code inside OCFS2, please remember the rest of us that survive just
fine with ext3 in Red Hat's Enterprise Linux.  :)

Third, is there any thoughts on integrating LVM support or using
something like Red Hat's CLVM to allow OCFS2 to layer on top of LVs
instead of just individual disks?

The biggest drawback I see in my environment is that my storage team
provides 34GB and 68GB metas from the EMC Frames.  I'd rather not have a
ton of 68GB OCFS2 filesystems but rather a larger, host-controlled LV.
Trying to get the storage team to provide a 200+GB LUN and grow it on
the fly in the future is a tough task.  If I could control the LV on the
host _and_ grow OCFS2 into larger LVs, that would rock.

Thanks.

/Brian/
-- 
       Brian Long                      |         |           |
       IT Data Center Systems          |       .|||.       .|||.
       Cisco Linux Developer           |   ..:|||||||:...:|||||||:..
       Phone: (919) 392-7363           |   C i s c o   S y s t e m s

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2006-05-22 19:18 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-25 18:35 [Ocfs2-devel] OCFS2 features RFC Mark Fasheh
2006-04-25 21:55 ` Christoph Hellwig
2006-04-25 22:24   ` Mark Fasheh
2006-04-26 16:50   ` Daniel Phillips
2006-04-26  4:11 ` Andi Kleen
2006-04-26 18:06   ` Mark Fasheh
2006-04-26 18:08     ` Andi Kleen
2006-04-26 18:34       ` Daniel Phillips
2006-04-27 20:25 ` Paul Taysom
2006-05-03 23:04 ` [Ocfs2-devel] OCFS2 features RFC - separate journal? Daniel Phillips
2006-05-04  0:29   ` Zach Brown
2006-05-04  0:46     ` Daniel Phillips
2006-05-04 20:56       ` Zach Brown
2006-05-04 20:59         ` Wim Coekaerts
2006-05-04 22:23         ` Daniel Phillips
2006-05-04 22:30           ` Mark Fasheh
2006-05-05  3:05             ` Daniel Phillips
2006-05-05 18:25               ` Mark Fasheh
2006-05-06  3:09                 ` Daniel Phillips
2006-05-05 17:12             ` Paul Taysom
2006-05-05 18:06               ` Daniel Phillips
2006-05-05 18:57               ` Sunil Mushran
2006-05-08 14:28             ` Paul Taysom
2006-05-08 17:43               ` Daniel Phillips
2006-05-08 18:00             ` Paul Taysom
2006-05-08 18:22               ` Daniel Phillips
2006-05-11 20:04 ` [Ocfs2-devel] OCFS2 features RFC Jeff Mahoney
2006-05-11 20:40   ` Paul Taysom
2006-05-11 20:55     ` Joel Becker
2006-05-11 21:16   ` Daniel Phillips
2006-05-17  1:44   ` Mark Fasheh
     [not found]     ` <446BBCF5.7040903@google.com>
     [not found]       ` <20060518024638.GY21588@ca-server1.us.oracle.com>
2006-05-19  0:35         ` Daniel Phillips
2006-05-19 15:16           ` J. Bruce Fields
2006-05-20  6:11           ` Mark Fasheh
2006-05-22 19:18             ` Daniel Phillips
2006-05-22 17:01     ` Paul Taysom
  -- strict thread matches above, loose matches on Subject: below --
2006-05-02 18:22 [Ocfs2-devel] OCFS2 Features RFC Brian Long
2006-05-02 20:29 ` Sunil Mushran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.