* [Ocfs2-devel] Extended Attribute Support?
@ 2006-06-07 11:35 EKC
2006-06-07 16:45 ` Mark Fasheh
0 siblings, 1 reply; 6+ messages in thread
From: EKC @ 2006-06-07 11:35 UTC (permalink / raw)
To: ocfs2-devel
Hello,
Any word on when extended attribute support will be added to OCFS2?
What are the impediments to someone implementing this?
Alternatively, has anyone implemented a patch to add extended
attribute support to OCFS2?
I've been watching the open source Lustre fs development, too. I
noticed that they have added extended attribute support. However,
OCFS2 is more attractive for some applications.
Thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Extended Attribute Support?
2006-06-07 11:35 [Ocfs2-devel] Extended Attribute Support? EKC
@ 2006-06-07 16:45 ` Mark Fasheh
2006-06-08 0:47 ` EKC
0 siblings, 1 reply; 6+ messages in thread
From: Mark Fasheh @ 2006-06-07 16:45 UTC (permalink / raw)
To: ocfs2-devel
On Wed, Jun 07, 2006 at 04:35:31AM -0700, EKC wrote:
> Any word on when extended attribute support will be added to OCFS2?
When we get to it ;) Seriously though, EA support is being planned for, it's
just lower on the priority list than things like sparse file support, online
resize, directory improvements, etc.
> What are the impediments to someone implementing this?
Not much - downloading the latest kernel code and editing fs/ocfs2/ to add
this :) Patches are always welcome, and actually I think EA is a good
candidate for someone who wants to contribute but doesn't yet understand the
all the cluster stuff.
As a first pass, a simple attribute block, pointed to by an __le64 in the
dinode would work fine. More improvements and optimizations could come after
that.
> I've been watching the open source Lustre fs development, too. I
> noticed that they have added extended attribute support. However,
> OCFS2 is more attractive for some applications.
Don't they get EA from ext3? Anyway, yeah Lustre does some really cool stuff
- I think we're pretty different in the types of clusters we serve though :)
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Extended Attribute Support?
2006-06-07 16:45 ` Mark Fasheh
@ 2006-06-08 0:47 ` EKC
2006-06-08 3:46 ` Mark Fasheh
0 siblings, 1 reply; 6+ messages in thread
From: EKC @ 2006-06-08 0:47 UTC (permalink / raw)
To: ocfs2-devel
Speaking of Lustre, how does OCFS2 compare in terms of scalability?
My understanding of OCFS2 is that it is limited to a maximum of 254
cluster nodes. However, most of the OCFS2 documentation that I've read
uses node slots per volume in the single digits. Are there any
practical limitations to using 254 node slots per volume on OCFS2, and
creating an OCFS2 cluster with 254 nodes (each node with 254 volumes
mounted on it)?
Since OCFS2 doesn't provide a unified namespace amongst volumes, I
would like to be able to mount the same volume across all of my
cluster nodes (up to 254). OCFS2 is attractive because of how clean
the code is, and its inclusion in the mainline kernel.
Thanks again
On 6/7/06, Mark Fasheh <mark.fasheh@oracle.com> wrote:
> On Wed, Jun 07, 2006 at 04:35:31AM -0700, EKC wrote:
> > Any word on when extended attribute support will be added to OCFS2?
> When we get to it ;) Seriously though, EA support is being planned for, it's
> just lower on the priority list than things like sparse file support, online
> resize, directory improvements, etc.
>
> > What are the impediments to someone implementing this?
> Not much - downloading the latest kernel code and editing fs/ocfs2/ to add
> this :) Patches are always welcome, and actually I think EA is a good
> candidate for someone who wants to contribute but doesn't yet understand the
> all the cluster stuff.
>
> As a first pass, a simple attribute block, pointed to by an __le64 in the
> dinode would work fine. More improvements and optimizations could come after
> that.
>
> > I've been watching the open source Lustre fs development, too. I
> > noticed that they have added extended attribute support. However,
> > OCFS2 is more attractive for some applications.
> Don't they get EA from ext3? Anyway, yeah Lustre does some really cool stuff
> - I think we're pretty different in the types of clusters we serve though :)
> --Mark
>
> --
> Mark Fasheh
> Senior Software Developer, Oracle
> mark.fasheh at oracle.com
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Extended Attribute Support?
2006-06-08 0:47 ` EKC
@ 2006-06-08 3:46 ` Mark Fasheh
2006-06-08 4:43 ` EKC
0 siblings, 1 reply; 6+ messages in thread
From: Mark Fasheh @ 2006-06-08 3:46 UTC (permalink / raw)
To: ocfs2-devel
On Wed, Jun 07, 2006 at 06:47:13PM -0600, EKC wrote:
> Speaking of Lustre, how does OCFS2 compare in terms of scalability?
I'm no Lustre expert, so please take what I say with a grain of salt :) That
said, Lustre seems to like to exist at the very high end of things -
thousands of nodes where OCFS2 is much more limited.
> My understanding of OCFS2 is that it is limited to a maximum of 254
> cluster nodes. However, most of the OCFS2 documentation that I've read
> uses node slots per volume in the single digits. Are there any
> practical limitations to using 254 node slots per volume on OCFS2, and
> creating an OCFS2 cluster with 254 nodes (each node with 254 volumes
> mounted on it)?
We test regularly on 16 node clusters here at Oracle. You would be correct
however that the majority of usage we see is on the tens of nodes scale. As
far as practical limitations to scaling, I think it may depend on your
usage. What is your intended application for the cluster? Also, I'm curious
as to what your shared storage will reside on.
Off the top of my head, issues that might arise in a large cluster could be
disk heartbeat overhead, lock mastery, and if you're doing lots of
concurrent meta data updates to shared directories/files you would incur a
performance hit as the meta data is synced to disk.
> Since OCFS2 doesn't provide a unified namespace amongst volumes, I
> would like to be able to mount the same volume across all of my
> cluster nodes (up to 254). OCFS2 is attractive because of how clean
> the code is, and its inclusion in the mainline kernel.
Well thanks for the kind words regarding our code :) By the way, would you
be using mainline kernels, or something provided by a distribution vendor
(i.e., SUSE, Red Hat, etc)
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Extended Attribute Support?
2006-06-08 3:46 ` Mark Fasheh
@ 2006-06-08 4:43 ` EKC
2006-06-09 4:08 ` Mark Fasheh
0 siblings, 1 reply; 6+ messages in thread
From: EKC @ 2006-06-08 4:43 UTC (permalink / raw)
To: ocfs2-devel
I'm using a mainline kernel (2.6.16.20) that I've patched to support
Linux Vserver (http://www.linux-vserver.org). However, Linux Vserver
has some unsatisfied dependencies on extended attributes (to enable
copy-on-write, chroot-like jails, and disk quotas), so my plan is to
patch OCFS2 to enable support for linux vserver.
I have a fourteen node cluster of dual dual-core opterons with local
SATA disks that I was using for Lustre. My plan is to use aoe with
DRBD mirroring between pairs of nodes (each node has two disks) for
OCFS2.
I am using the cluster to distribute the load for self-contained
database-backed (mysql, Berkeley db, and o_append/mmap flat-file
"databases") applications, each of which is hosted in its own vserver.
If a node dies or resources become available elsewhere, the vserver is
shutdown on one node and launched on the other. Vserver instances
cannot run on more than one node at a time. The cluster FS is used to
enable this migration of vservers from one node to another.
This use case becomes complicated because I need to quickly "clone"
vservers. I've looked at layering unionfs or cowloop ontop of a
cluster fs. However, my preference is to use vserver's COW support
(hard-link two files and flag them as 'immutable' and 'unlink'; break
the link and copy the files on write,chmod,chown).
My compute and storage needs are closely correlated. Filesystem reads
dominate (80%) over writes. Directories are shared between nodes only
in COW cases. Metadata operations and read/writes have a a lot of
spatial locality. Cache coherency becomes an issue with the COW
(hardlinking) requirement.
If I can find a way to quickly "clone" vserver directories without
COW, this whole thing becomes much simpler. Each vserver basically a
1gb linux installation under one directory. I'm using two-port bonded
gigabit ethernet on a single cross-bar with jumbo (9k) frames between
nodes; and, dedicated cross-over gigabit ethernet between DRBD pairs.
On 6/7/06, Mark Fasheh <mark.fasheh@oracle.com> wrote:
> On Wed, Jun 07, 2006 at 06:47:13PM -0600, EKC wrote:
> > Speaking of Lustre, how does OCFS2 compare in terms of scalability?
> I'm no Lustre expert, so please take what I say with a grain of salt :) That
> said, Lustre seems to like to exist at the very high end of things -
> thousands of nodes where OCFS2 is much more limited.
>
> > My understanding of OCFS2 is that it is limited to a maximum of 254
> > cluster nodes. However, most of the OCFS2 documentation that I've read
> > uses node slots per volume in the single digits. Are there any
> > practical limitations to using 254 node slots per volume on OCFS2, and
> > creating an OCFS2 cluster with 254 nodes (each node with 254 volumes
> > mounted on it)?
> We test regularly on 16 node clusters here at Oracle. You would be correct
> however that the majority of usage we see is on the tens of nodes scale. As
> far as practical limitations to scaling, I think it may depend on your
> usage. What is your intended application for the cluster? Also, I'm curious
> as to what your shared storage will reside on.
>
> Off the top of my head, issues that might arise in a large cluster could be
> disk heartbeat overhead, lock mastery, and if you're doing lots of
> concurrent meta data updates to shared directories/files you would incur a
> performance hit as the meta data is synced to disk.
>
> > Since OCFS2 doesn't provide a unified namespace amongst volumes, I
> > would like to be able to mount the same volume across all of my
> > cluster nodes (up to 254). OCFS2 is attractive because of how clean
> > the code is, and its inclusion in the mainline kernel.
> Well thanks for the kind words regarding our code :) By the way, would you
> be using mainline kernels, or something provided by a distribution vendor
> (i.e., SUSE, Red Hat, etc)
> --Mark
>
> --
> Mark Fasheh
> Senior Software Developer, Oracle
> mark.fasheh at oracle.com
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Extended Attribute Support?
2006-06-08 4:43 ` EKC
@ 2006-06-09 4:08 ` Mark Fasheh
0 siblings, 0 replies; 6+ messages in thread
From: Mark Fasheh @ 2006-06-09 4:08 UTC (permalink / raw)
To: ocfs2-devel
On Wed, Jun 07, 2006 at 09:43:47PM -0700, EKC wrote:
> I have a fourteen node cluster of dual dual-core opterons with local
> SATA disks that I was using for Lustre. My plan is to use aoe with
> DRBD mirroring between pairs of nodes (each node has two disks) for
> OCFS2.
Ok, so what's the maximum node count per shared disk here? 2 or 14? Either
value is easily within the range of what we test with here. Total nodes in
the cluster shouldn't really your performance.
> I am using the cluster to distribute the load for self-contained
> database-backed (mysql, Berkeley db, and o_append/mmap flat-file
> "databases") applications, each of which is hosted in its own vserver.
So one thing we don't support yet is shared writeable mmap. It looks like we
might have that going soon though, as David Howells has a patch in -mm which
makes that alot easier to implement in OCFS2.
> This use case becomes complicated because I need to quickly "clone"
> vservers. I've looked at layering unionfs or cowloop ontop of a
> cluster fs. However, my preference is to use vserver's COW support
> (hard-link two files and flag them as 'immutable' and 'unlink'; break
> the link and copy the files on write,chmod,chown).
After having read the vserver home page, their COW support seems promising,
but I'm unaware of how they implement that so I couldn't comment on OCFS2
interaction.
> My compute and storage needs are closely correlated. Filesystem reads
> dominate (80%) over writes. Directories are shared between nodes only
> in COW cases. Metadata operations and read/writes have a a lot of
> spatial locality. Cache coherency becomes an issue with the COW
> (hardlinking) requirement.
Ok, so mostly reads and with COW, writes look like they'll be mostly node
local. This is an optimum load for OCFS2 as you will not have many inodes
which have to be passed between nodes.
So taking into account node numbers and your expected load it seems like you
probably have a fair chance of making that work, assuming we get those
features you're missing.
Well, you certainly have an interesting use case, and I hope that OCFS2 can
accomodate your needs.
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-06-09 4:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-07 11:35 [Ocfs2-devel] Extended Attribute Support? EKC
2006-06-07 16:45 ` Mark Fasheh
2006-06-08 0:47 ` EKC
2006-06-08 3:46 ` Mark Fasheh
2006-06-08 4:43 ` EKC
2006-06-09 4:08 ` Mark Fasheh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.