linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
       [not found]     ` <20050822150505.7978136d.akpm@osdl.org>
@ 2005-08-24  7:18       ` Christoph Hellwig
  2005-08-24 20:33         ` Joel Becker
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2005-08-24  7:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Joel Becker, mark.fasheh, linux-fsdevel

> > > BTW, from where I sit, ocfs2 is "on hold" due to some additional work which
> > > hch identified when I was on vacation and not paying much attention.  vma
> > > walk, perhaps?
> > 
> > 	I don't know of anything that should put it "on hold".  Copying
> > Mark on this.  Mark?
> 
> (cc hch)
> 
> On 10 Aug Christoph told me "While OCFS is evolving really nicely there's a
> bunch of major things that need to be sorted out".
> 
> Christoph, could you please itemise these things?

Major items known:

 - oracore workarounds must go away
 - magic symlinks that pollute the posix filename namespace must go
   away
 - vma-walking locking must move to common code (zab is working on that
   afaik)
 - the buffered aio mess needs sorting out.  imho the best thing was
   to just drop that code from ocfs for now and let oracle work with
   bcrl and suparna to make sure their buffered aio code works nicely
   with ocfs and/or picks up some of their ideas
 - there's still some procfs abuse

That's just the off my head things, the oracle people actually asked
me to wait with a review until they've cleared their TODO lists, I'll
do a real review once I'll get some time.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-24  7:18       ` [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] Christoph Hellwig
@ 2005-08-24 20:33         ` Joel Becker
  2005-08-25  9:58           ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Joel Becker @ 2005-08-24 20:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, mark.fasheh, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1948 bytes --]

On Wed, Aug 24, 2005 at 09:18:35AM +0200, Christoph Hellwig wrote:
>  - oracore workarounds must go away

	These are not part of the linus-ward submission.

>  - magic symlinks that pollute the posix filename namespace must go
>    away

	We're still trying to come up with a way to solve the problem
without magic symlinks.  Suggestions still welcome.

>  - vma-walking locking must move to common code (zab is working on that
>    afaik)

	The vma-walking will go away, replaced by another mmap scheme
entirely.  However, that's three or four months away.  The current code
is merely a stopgap for now. 
	Many folks have an interest in having a cluster filesystem in
mainline.  This seems like an issue that can be resolved later, not a
big blocker.  That is, it would be worth more to people to have it in
mainline for the next four months, knowing this will get fixed, than
keeping it out of mainline for four months over this feature.

>  - the buffered aio mess needs sorting out.  imho the best thing was

	Well, that's a mainline problem.  Yes, we should all work
towards improving mainline.  But again I'm not sure others are served
keeping OCFS2 out over this.

>  - there's still some procfs abuse

	Specifics of what is abuse vs OK would be interesting.

> That's just the off my head things, the oracle people actually asked
> me to wait with a review until they've cleared their TODO lists, I'll
> do a real review once I'll get some time.

	There are some sizeable things on this "top of the head" list
already.  I'd like to find a nice balance between "goes to mainline now"
and "must be a perfect piece of software before it goes in."  No
software will ever be perfect.

Joel

-- 

"For every complex problem there exists a solution that is brief,
     concise, and totally wrong."
                                        -Unknown

			http://www.jlbec.org/
			jlbec@evilplan.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-24 20:33         ` Joel Becker
@ 2005-08-25  9:58           ` Christoph Hellwig
  2005-08-25 17:45             ` Mark Fasheh
  2005-08-25 18:45             ` Zach Brown
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2005-08-25  9:58 UTC (permalink / raw)
  To: Joel Becker; +Cc: Andrew Morton, mark.fasheh, linux-fsdevel

On Wed, Aug 24, 2005 at 01:33:52PM -0700, Joel Becker wrote:
> On Wed, Aug 24, 2005 at 09:18:35AM +0200, Christoph Hellwig wrote:
> >  - oracore workarounds must go away
> 
> 	These are not part of the linus-ward submission.

Ok.

> >  - magic symlinks that pollute the posix filename namespace must go
> >    away
> 
> 	We're still trying to come up with a way to solve the problem
> without magic symlinks.  Suggestions still welcome.

That's fine, you're free to come up whatever problem you have of course
;-)  Doesn't mean we're gonna put the broken variant into mainline,
though.

> >  - vma-walking locking must move to common code (zab is working on that
> >    afaik)
> 
> 	The vma-walking will go away, replaced by another mmap scheme
> entirely.  However, that's three or four months away.  The current code
> is merely a stopgap for now. 
> 	Many folks have an interest in having a cluster filesystem in
> mainline.  This seems like an issue that can be resolved later, not a
> big blocker.  That is, it would be worth more to people to have it in
> mainline for the next four months, knowing this will get fixed, than
> keeping it out of mainline for four months over this feature.

I don't think it'll take four month, but we're having a bad predence
here - GFS pretty much duplicates the same mess and if we let that
in we're growing more and more of it.  Please get it right conceptually
first, it doesn't have to be perfect.

> >  - the buffered aio mess needs sorting out.  imho the best thing was
> 
> 	Well, that's a mainline problem.  Yes, we should all work
> towards improving mainline.  But again I'm not sure others are served
> keeping OCFS2 out over this.

Currently we don't support buffered aio on any filesystem in mainline,
so adding crufty code to mainline sounds like a bad idea.  Zab agreed
on that and wants to remove it as much as it gets.

> >  - there's still some procfs abuse
> 
> 	Specifics of what is abuse vs OK would be interesting.

You're using procfs for non-process data.

> > That's just the off my head things, the oracle people actually asked
> > me to wait with a review until they've cleared their TODO lists, I'll
> > do a real review once I'll get some time.
> 
> 	There are some sizeable things on this "top of the head" list
> already.  I'd like to find a nice balance between "goes to mainline now"
> and "must be a perfect piece of software before it goes in."  No
> software will ever be perfect.

That's really the big things, it's certainly not perfect after that..

Note that it's often easier to drop unfished/messy/controversial
features and do them right later.  That avoids the everything must
be perfect syndrome while keeping out the big mess.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-25  9:58           ` Christoph Hellwig
@ 2005-08-25 17:45             ` Mark Fasheh
  2005-08-28 22:48               ` Greg KH
  2005-08-25 18:45             ` Zach Brown
  1 sibling, 1 reply; 12+ messages in thread
From: Mark Fasheh @ 2005-08-25 17:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Joel Becker, Andrew Morton, linux-fsdevel, Wim Coekaerts

On Thu, Aug 25, 2005 at 11:58:19AM +0200, Christoph Hellwig wrote:
> > 	The vma-walking will go away, replaced by another mmap scheme
> > entirely.  However, that's three or four months away.  The current code
> > is merely a stopgap for now. 
> > 	Many folks have an interest in having a cluster filesystem in
> > mainline.  This seems like an issue that can be resolved later, not a
> > big blocker.  That is, it would be worth more to people to have it in
> > mainline for the next four months, knowing this will get fixed, than
> > keeping it out of mainline for four months over this feature.
> 
> I don't think it'll take four month, but we're having a bad predence
> here - GFS pretty much duplicates the same mess and if we let that
> in we're growing more and more of it.  Please get it right conceptually
> first, it doesn't have to be perfect.

We're fixing this by taking a completely different approach from what is
done today, which won't involve vma walking. It *will* take some time
however, at least in testing and validation. In the meantime I'd rather not
see us do all the work to port something to the VFS which we won't even be
using in a few months time.

> > >  - there's still some procfs abuse
> > 
> > 	Specifics of what is abuse vs OK would be interesting.
> 
> You're using procfs for non-process data.

I'm not sure I understand this... Looking through /proc I see lots of
subsystems using /proc in similar ways to us. or is there a very specific
method which you have a problem with?
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-25  9:58           ` Christoph Hellwig
  2005-08-25 17:45             ` Mark Fasheh
@ 2005-08-25 18:45             ` Zach Brown
  2005-08-25 20:23               ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Zach Brown @ 2005-08-25 18:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Joel Becker, Andrew Morton, mark.fasheh, linux-fsdevel


> Currently we don't support buffered aio on any filesystem in
> mainline, so adding crufty code to mainline sounds like a bad idea.
> Zab agreed on that and wants to remove it as much as it gets.

Yeah, we aim to simplify this code.  For the record, it wasn't buffered
aio that was the problem.  There were two naughty moving parts:

First, trying not to block in the dlm when issuing aio ops and tracking
state to restart after dlm ops returned eiocbqueued.  This was just
overly aggressive.  This can behave like block mapping lookups in that
it rarely blocks.  Most aio that people care about (direct io writes to
already allocated regions) will simply be acquiring and releasing
shared-read locks around each op -- trivial local operations.

Second, trying to hold dlm locks around the entirety of aio ops. This
led to the mess of trying to tear down locks in the iocb dtor method.
(which can then race with unmount, aio does __fput on the filp, dropping
the vfsmount ref, before calling dtor.. bleh). We can get around this
by unlocking after performing the block mapping lookups and issueing the
io and introducing a cluster DLM lock which behaves like i_alloc_sem.

So, how about a patch that lets the fs provide a callback to
acquire/release i_alloc_sem at the current sites (dio, notify_change)
that work with it? Most file systems wouldn't provide a callback and
the code would just use the sem as usual, but clustered guys could use
dlm locking.

- z

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-25 18:45             ` Zach Brown
@ 2005-08-25 20:23               ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2005-08-25 20:23 UTC (permalink / raw)
  To: Zach Brown
  Cc: Christoph Hellwig, Joel Becker, Andrew Morton, mark.fasheh,
	linux-fsdevel

On Thu, Aug 25, 2005 at 11:45:14AM -0700, Zach Brown wrote:
> Yeah, we aim to simplify this code.  For the record, it wasn't buffered
> aio that was the problem.  There were two naughty moving parts:
> 
> First, trying not to block in the dlm when issuing aio ops and tracking
> state to restart after dlm ops returned eiocbqueued.  This was just
> overly aggressive.  This can behave like block mapping lookups in that
> it rarely blocks.  Most aio that people care about (direct io writes to
> already allocated regions) will simply be acquiring and releasing
> shared-read locks around each op -- trivial local operations.
> 
> Second, trying to hold dlm locks around the entirety of aio ops. This
> led to the mess of trying to tear down locks in the iocb dtor method.
> (which can then race with unmount, aio does __fput on the filp, dropping
> the vfsmount ref, before calling dtor.. bleh). We can get around this
> by unlocking after performing the block mapping lookups and issueing the
> io and introducing a cluster DLM lock which behaves like i_alloc_sem.

You might want to look at XFS as a model for this.  While it's not
clustered it has it's own r/w semaphore to protect block allocations.
It's not using the i_alloc_sem at all but some 'clever' behaviour with
downgrading the lock after the block allocations are done.

> So, how about a patch that lets the fs provide a callback to
> acquire/release i_alloc_sem at the current sites (dio, notify_change)
> that work with it? Most file systems wouldn't provide a callback and
> the code would just use the sem as usual, but clustered guys could use
> dlm locking.

If we're going down that route I'd say provide the callback for
filesystems that actually need locking only, but there must be a better
way to do that.

Note that in any case you're doing lots of work for the buffere path
aswell in aio.c that should be nessecary with a bit of refactoring.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-25 17:45             ` Mark Fasheh
@ 2005-08-28 22:48               ` Greg KH
  2005-08-29 17:41                 ` Joel Becker
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2005-08-28 22:48 UTC (permalink / raw)
  To: Mark Fasheh
  Cc: Christoph Hellwig, Joel Becker, Andrew Morton, linux-fsdevel,
	Wim Coekaerts

On Thu, Aug 25, 2005 at 10:45:42AM -0700, Mark Fasheh wrote:
> On Thu, Aug 25, 2005 at 11:58:19AM +0200, Christoph Hellwig wrote:
> > > >  - there's still some procfs abuse
> > > 
> > > 	Specifics of what is abuse vs OK would be interesting.
> > 
> > You're using procfs for non-process data.
> 
> I'm not sure I understand this... Looking through /proc I see lots of
> subsystems using /proc in similar ways to us. or is there a very specific
> method which you have a problem with?

No new subsystems or code shall add /proc files that do not explicitly
pertain to process information.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-28 22:48               ` Greg KH
@ 2005-08-29 17:41                 ` Joel Becker
  2005-08-29 19:29                   ` Miklos Szeredi
  0 siblings, 1 reply; 12+ messages in thread
From: Joel Becker @ 2005-08-29 17:41 UTC (permalink / raw)
  To: Greg KH
  Cc: Mark Fasheh, Christoph Hellwig, Andrew Morton, linux-fsdevel,
	Wim Coekaerts

On Sun, Aug 28, 2005 at 03:48:26PM -0700, Greg KH wrote:
> No new subsystems or code shall add /proc files that do not explicitly
> pertain to process information.

	Fair enough, where in /sys should such things go?  /proc/fs is a
well-known place, but there is no /sys/fs :-)

Joel

-- 

"Sometimes when reading Goethe I have the paralyzing suspicion
 that he is trying to be funny."
         - Guy Davenport

Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-29 17:41                 ` Joel Becker
@ 2005-08-29 19:29                   ` Miklos Szeredi
  2005-08-31  6:14                     ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2005-08-29 19:29 UTC (permalink / raw)
  To: Joel.Becker; +Cc: greg, mark.fasheh, hch, akpm, linux-fsdevel, wim.coekaerts

> 	Fair enough, where in /sys should such things go?  /proc/fs is a
> well-known place, but there is no /sys/fs :-)

It's pretty easy to create.  I had a patch:

  http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110099238515110&w=2

to which Greg had a comment:

  http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110114650113580&w=2

which I fixed, but I can't find it anymore.

Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-29 19:29                   ` Miklos Szeredi
@ 2005-08-31  6:14                     ` Greg KH
  2005-08-31  8:24                       ` Joel Becker
  2005-08-31 11:11                       ` Miklos Szeredi
  0 siblings, 2 replies; 12+ messages in thread
From: Greg KH @ 2005-08-31  6:14 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joel.Becker, mark.fasheh, hch, akpm, linux-fsdevel, wim.coekaerts

On Mon, Aug 29, 2005 at 09:29:55PM +0200, Miklos Szeredi wrote:
> > 	Fair enough, where in /sys should such things go?  /proc/fs is a
> > well-known place, but there is no /sys/fs :-)

Actually, configfs should probably be mounted in /sys/kernel/config/
Just create that mount point and away you go (look at securityfs and
debugfs for examples of how to do this.)  It keeps the LSB people happy
that you don't go around creating new / directories.

> It's pretty easy to create.  I had a patch:
> 
>   http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110099238515110&w=2
> 
> to which Greg had a comment:
> 
>   http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110114650113580&w=2
> 
> which I fixed, but I can't find it anymore.

Care to resend your fixed patch?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-31  6:14                     ` Greg KH
@ 2005-08-31  8:24                       ` Joel Becker
  2005-08-31 11:11                       ` Miklos Szeredi
  1 sibling, 0 replies; 12+ messages in thread
From: Joel Becker @ 2005-08-31  8:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Miklos Szeredi, mark.fasheh, hch, akpm, linux-fsdevel,
	wim.coekaerts

On Tue, Aug 30, 2005 at 11:14:24PM -0700, Greg KH wrote:
> On Mon, Aug 29, 2005 at 09:29:55PM +0200, Miklos Szeredi wrote:
> > > 	Fair enough, where in /sys should such things go?  /proc/fs is a
> > > well-known place, but there is no /sys/fs :-)
> 
> Actually, configfs should probably be mounted in /sys/kernel/config/

	We were speaking of stuff ocfs2 puts in /proc/fs/ocfs2 right now
(and a few ocfs2 sysctls too).  Cristoph stated that all the proc stuff
for ocfs2 (/proc/fs and /proc/sys/fs) should come out of procfs and move
to sysfs.
	As far as configfs goes, I can't recall why you and I agreed on
/config over /sys/kernel/config, but I'm not against changing it on the
face of it.  I'll go hunt up our discussion.

Joel

-- 

"Conservative, n.  A statesman who is enamoured of existing evils,
 as distinguished from the Liberal, who wishes to replace them
 with others."
	- Ambrose Bierce, The Devil's Dictionary

Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
  2005-08-31  6:14                     ` Greg KH
  2005-08-31  8:24                       ` Joel Becker
@ 2005-08-31 11:11                       ` Miklos Szeredi
  1 sibling, 0 replies; 12+ messages in thread
From: Miklos Szeredi @ 2005-08-31 11:11 UTC (permalink / raw)
  To: greg; +Cc: Joel.Becker, mark.fasheh, hch, akpm, linux-fsdevel, wim.coekaerts

> > It's pretty easy to create.  I had a patch:
> > 
> >   http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110099238515110&w=2
> > 
> > to which Greg had a comment:
> > 
> >   http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110114650113580&w=2
> > 
> > which I fixed, but I can't find it anymore.
> 
> Care to resend your fixed patch?

OK, here it is (untested).  reiser4 in -mm used to create /sys/fs for
itself, but it doesn't seem to do so anymore, so it should be safe.

---
This patch adds an empty /sys/fs, which filesystems can use.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2005-08-31 12:51:30.000000000 +0200
+++ linux/include/linux/fs.h	2005-08-31 12:58:12.000000000 +0200
@@ -1251,6 +1251,9 @@ extern long do_mount(char *, char *, cha
 
 extern int vfs_statfs(struct super_block *, struct kstatfs *);
 
+/* /sys/fs */
+extern struct subsystem fs_subsys;
+
 #define FLOCK_VERIFY_READ  1
 #define FLOCK_VERIFY_WRITE 2
 
Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2005-08-19 14:58:53.000000000 +0200
+++ linux/fs/namespace.c	2005-08-31 12:55:48.000000000 +0200
@@ -43,6 +43,9 @@ static struct list_head *mount_hashtable
 static int hash_mask, hash_bits;
 static kmem_cache_t *mnt_cache; 
 
+/* /sys/fs */
+decl_subsys(fs, NULL, NULL);
+
 static inline unsigned long hash(struct vfsmount *mnt, struct dentry *dentry)
 {
 	unsigned long tmp = ((unsigned long) mnt / L1_CACHE_BYTES);
@@ -1453,6 +1456,7 @@ void __init mnt_init(unsigned long mempa
 		i--;
 	} while (i);
 	sysfs_init();
+	subsystem_register(&fs_subsys);
 	init_rootfs();
 	init_mount_tree();
 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-08-31 11:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20050822213220.GH19387@insight.us.oracle.com>
     [not found] ` <20050822144521.24494329.akpm@osdl.org>
     [not found]   ` <20050822215049.GI19387@insight.us.oracle.com>
     [not found]     ` <20050822150505.7978136d.akpm@osdl.org>
2005-08-24  7:18       ` [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] Christoph Hellwig
2005-08-24 20:33         ` Joel Becker
2005-08-25  9:58           ` Christoph Hellwig
2005-08-25 17:45             ` Mark Fasheh
2005-08-28 22:48               ` Greg KH
2005-08-29 17:41                 ` Joel Becker
2005-08-29 19:29                   ` Miklos Szeredi
2005-08-31  6:14                     ` Greg KH
2005-08-31  8:24                       ` Joel Becker
2005-08-31 11:11                       ` Miklos Szeredi
2005-08-25 18:45             ` Zach Brown
2005-08-25 20:23               ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).