* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] [not found] ` <20050822150505.7978136d.akpm@osdl.org> @ 2005-08-24 7:18 ` Christoph Hellwig 2005-08-24 20:33 ` Joel Becker 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2005-08-24 7:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Joel Becker, mark.fasheh, linux-fsdevel > > > BTW, from where I sit, ocfs2 is "on hold" due to some additional work which > > > hch identified when I was on vacation and not paying much attention. vma > > > walk, perhaps? > > > > I don't know of anything that should put it "on hold". Copying > > Mark on this. Mark? > > (cc hch) > > On 10 Aug Christoph told me "While OCFS is evolving really nicely there's a > bunch of major things that need to be sorted out". > > Christoph, could you please itemise these things? Major items known: - oracore workarounds must go away - magic symlinks that pollute the posix filename namespace must go away - vma-walking locking must move to common code (zab is working on that afaik) - the buffered aio mess needs sorting out. imho the best thing was to just drop that code from ocfs for now and let oracle work with bcrl and suparna to make sure their buffered aio code works nicely with ocfs and/or picks up some of their ideas - there's still some procfs abuse That's just the off my head things, the oracle people actually asked me to wait with a review until they've cleared their TODO lists, I'll do a real review once I'll get some time. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-24 7:18 ` [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] Christoph Hellwig @ 2005-08-24 20:33 ` Joel Becker 2005-08-25 9:58 ` Christoph Hellwig 0 siblings, 1 reply; 12+ messages in thread From: Joel Becker @ 2005-08-24 20:33 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Andrew Morton, mark.fasheh, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1948 bytes --] On Wed, Aug 24, 2005 at 09:18:35AM +0200, Christoph Hellwig wrote: > - oracore workarounds must go away These are not part of the linus-ward submission. > - magic symlinks that pollute the posix filename namespace must go > away We're still trying to come up with a way to solve the problem without magic symlinks. Suggestions still welcome. > - vma-walking locking must move to common code (zab is working on that > afaik) The vma-walking will go away, replaced by another mmap scheme entirely. However, that's three or four months away. The current code is merely a stopgap for now. Many folks have an interest in having a cluster filesystem in mainline. This seems like an issue that can be resolved later, not a big blocker. That is, it would be worth more to people to have it in mainline for the next four months, knowing this will get fixed, than keeping it out of mainline for four months over this feature. > - the buffered aio mess needs sorting out. imho the best thing was Well, that's a mainline problem. Yes, we should all work towards improving mainline. But again I'm not sure others are served keeping OCFS2 out over this. > - there's still some procfs abuse Specifics of what is abuse vs OK would be interesting. > That's just the off my head things, the oracle people actually asked > me to wait with a review until they've cleared their TODO lists, I'll > do a real review once I'll get some time. There are some sizeable things on this "top of the head" list already. I'd like to find a nice balance between "goes to mainline now" and "must be a perfect piece of software before it goes in." No software will ever be perfect. Joel -- "For every complex problem there exists a solution that is brief, concise, and totally wrong." -Unknown http://www.jlbec.org/ jlbec@evilplan.org [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-24 20:33 ` Joel Becker @ 2005-08-25 9:58 ` Christoph Hellwig 2005-08-25 17:45 ` Mark Fasheh 2005-08-25 18:45 ` Zach Brown 0 siblings, 2 replies; 12+ messages in thread From: Christoph Hellwig @ 2005-08-25 9:58 UTC (permalink / raw) To: Joel Becker; +Cc: Andrew Morton, mark.fasheh, linux-fsdevel On Wed, Aug 24, 2005 at 01:33:52PM -0700, Joel Becker wrote: > On Wed, Aug 24, 2005 at 09:18:35AM +0200, Christoph Hellwig wrote: > > - oracore workarounds must go away > > These are not part of the linus-ward submission. Ok. > > - magic symlinks that pollute the posix filename namespace must go > > away > > We're still trying to come up with a way to solve the problem > without magic symlinks. Suggestions still welcome. That's fine, you're free to come up whatever problem you have of course ;-) Doesn't mean we're gonna put the broken variant into mainline, though. > > - vma-walking locking must move to common code (zab is working on that > > afaik) > > The vma-walking will go away, replaced by another mmap scheme > entirely. However, that's three or four months away. The current code > is merely a stopgap for now. > Many folks have an interest in having a cluster filesystem in > mainline. This seems like an issue that can be resolved later, not a > big blocker. That is, it would be worth more to people to have it in > mainline for the next four months, knowing this will get fixed, than > keeping it out of mainline for four months over this feature. I don't think it'll take four month, but we're having a bad predence here - GFS pretty much duplicates the same mess and if we let that in we're growing more and more of it. Please get it right conceptually first, it doesn't have to be perfect. > > - the buffered aio mess needs sorting out. imho the best thing was > > Well, that's a mainline problem. Yes, we should all work > towards improving mainline. But again I'm not sure others are served > keeping OCFS2 out over this. Currently we don't support buffered aio on any filesystem in mainline, so adding crufty code to mainline sounds like a bad idea. Zab agreed on that and wants to remove it as much as it gets. > > - there's still some procfs abuse > > Specifics of what is abuse vs OK would be interesting. You're using procfs for non-process data. > > That's just the off my head things, the oracle people actually asked > > me to wait with a review until they've cleared their TODO lists, I'll > > do a real review once I'll get some time. > > There are some sizeable things on this "top of the head" list > already. I'd like to find a nice balance between "goes to mainline now" > and "must be a perfect piece of software before it goes in." No > software will ever be perfect. That's really the big things, it's certainly not perfect after that.. Note that it's often easier to drop unfished/messy/controversial features and do them right later. That avoids the everything must be perfect syndrome while keeping out the big mess. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-25 9:58 ` Christoph Hellwig @ 2005-08-25 17:45 ` Mark Fasheh 2005-08-28 22:48 ` Greg KH 2005-08-25 18:45 ` Zach Brown 1 sibling, 1 reply; 12+ messages in thread From: Mark Fasheh @ 2005-08-25 17:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Joel Becker, Andrew Morton, linux-fsdevel, Wim Coekaerts On Thu, Aug 25, 2005 at 11:58:19AM +0200, Christoph Hellwig wrote: > > The vma-walking will go away, replaced by another mmap scheme > > entirely. However, that's three or four months away. The current code > > is merely a stopgap for now. > > Many folks have an interest in having a cluster filesystem in > > mainline. This seems like an issue that can be resolved later, not a > > big blocker. That is, it would be worth more to people to have it in > > mainline for the next four months, knowing this will get fixed, than > > keeping it out of mainline for four months over this feature. > > I don't think it'll take four month, but we're having a bad predence > here - GFS pretty much duplicates the same mess and if we let that > in we're growing more and more of it. Please get it right conceptually > first, it doesn't have to be perfect. We're fixing this by taking a completely different approach from what is done today, which won't involve vma walking. It *will* take some time however, at least in testing and validation. In the meantime I'd rather not see us do all the work to port something to the VFS which we won't even be using in a few months time. > > > - there's still some procfs abuse > > > > Specifics of what is abuse vs OK would be interesting. > > You're using procfs for non-process data. I'm not sure I understand this... Looking through /proc I see lots of subsystems using /proc in similar ways to us. or is there a very specific method which you have a problem with? --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh@oracle.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-25 17:45 ` Mark Fasheh @ 2005-08-28 22:48 ` Greg KH 2005-08-29 17:41 ` Joel Becker 0 siblings, 1 reply; 12+ messages in thread From: Greg KH @ 2005-08-28 22:48 UTC (permalink / raw) To: Mark Fasheh Cc: Christoph Hellwig, Joel Becker, Andrew Morton, linux-fsdevel, Wim Coekaerts On Thu, Aug 25, 2005 at 10:45:42AM -0700, Mark Fasheh wrote: > On Thu, Aug 25, 2005 at 11:58:19AM +0200, Christoph Hellwig wrote: > > > > - there's still some procfs abuse > > > > > > Specifics of what is abuse vs OK would be interesting. > > > > You're using procfs for non-process data. > > I'm not sure I understand this... Looking through /proc I see lots of > subsystems using /proc in similar ways to us. or is there a very specific > method which you have a problem with? No new subsystems or code shall add /proc files that do not explicitly pertain to process information. thanks, greg k-h ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-28 22:48 ` Greg KH @ 2005-08-29 17:41 ` Joel Becker 2005-08-29 19:29 ` Miklos Szeredi 0 siblings, 1 reply; 12+ messages in thread From: Joel Becker @ 2005-08-29 17:41 UTC (permalink / raw) To: Greg KH Cc: Mark Fasheh, Christoph Hellwig, Andrew Morton, linux-fsdevel, Wim Coekaerts On Sun, Aug 28, 2005 at 03:48:26PM -0700, Greg KH wrote: > No new subsystems or code shall add /proc files that do not explicitly > pertain to process information. Fair enough, where in /sys should such things go? /proc/fs is a well-known place, but there is no /sys/fs :-) Joel -- "Sometimes when reading Goethe I have the paralyzing suspicion that he is trying to be funny." - Guy Davenport Joel Becker Senior Member of Technical Staff Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-29 17:41 ` Joel Becker @ 2005-08-29 19:29 ` Miklos Szeredi 2005-08-31 6:14 ` Greg KH 0 siblings, 1 reply; 12+ messages in thread From: Miklos Szeredi @ 2005-08-29 19:29 UTC (permalink / raw) To: Joel.Becker; +Cc: greg, mark.fasheh, hch, akpm, linux-fsdevel, wim.coekaerts > Fair enough, where in /sys should such things go? /proc/fs is a > well-known place, but there is no /sys/fs :-) It's pretty easy to create. I had a patch: http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110099238515110&w=2 to which Greg had a comment: http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110114650113580&w=2 which I fixed, but I can't find it anymore. Miklos ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-29 19:29 ` Miklos Szeredi @ 2005-08-31 6:14 ` Greg KH 2005-08-31 8:24 ` Joel Becker 2005-08-31 11:11 ` Miklos Szeredi 0 siblings, 2 replies; 12+ messages in thread From: Greg KH @ 2005-08-31 6:14 UTC (permalink / raw) To: Miklos Szeredi Cc: Joel.Becker, mark.fasheh, hch, akpm, linux-fsdevel, wim.coekaerts On Mon, Aug 29, 2005 at 09:29:55PM +0200, Miklos Szeredi wrote: > > Fair enough, where in /sys should such things go? /proc/fs is a > > well-known place, but there is no /sys/fs :-) Actually, configfs should probably be mounted in /sys/kernel/config/ Just create that mount point and away you go (look at securityfs and debugfs for examples of how to do this.) It keeps the LSB people happy that you don't go around creating new / directories. > It's pretty easy to create. I had a patch: > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110099238515110&w=2 > > to which Greg had a comment: > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110114650113580&w=2 > > which I fixed, but I can't find it anymore. Care to resend your fixed patch? thanks, greg k-h ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-31 6:14 ` Greg KH @ 2005-08-31 8:24 ` Joel Becker 2005-08-31 11:11 ` Miklos Szeredi 1 sibling, 0 replies; 12+ messages in thread From: Joel Becker @ 2005-08-31 8:24 UTC (permalink / raw) To: Greg KH Cc: Miklos Szeredi, mark.fasheh, hch, akpm, linux-fsdevel, wim.coekaerts On Tue, Aug 30, 2005 at 11:14:24PM -0700, Greg KH wrote: > On Mon, Aug 29, 2005 at 09:29:55PM +0200, Miklos Szeredi wrote: > > > Fair enough, where in /sys should such things go? /proc/fs is a > > > well-known place, but there is no /sys/fs :-) > > Actually, configfs should probably be mounted in /sys/kernel/config/ We were speaking of stuff ocfs2 puts in /proc/fs/ocfs2 right now (and a few ocfs2 sysctls too). Cristoph stated that all the proc stuff for ocfs2 (/proc/fs and /proc/sys/fs) should come out of procfs and move to sysfs. As far as configfs goes, I can't recall why you and I agreed on /config over /sys/kernel/config, but I'm not against changing it on the face of it. I'll go hunt up our discussion. Joel -- "Conservative, n. A statesman who is enamoured of existing evils, as distinguished from the Liberal, who wishes to replace them with others." - Ambrose Bierce, The Devil's Dictionary Joel Becker Senior Member of Technical Staff Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-31 6:14 ` Greg KH 2005-08-31 8:24 ` Joel Becker @ 2005-08-31 11:11 ` Miklos Szeredi 1 sibling, 0 replies; 12+ messages in thread From: Miklos Szeredi @ 2005-08-31 11:11 UTC (permalink / raw) To: greg; +Cc: Joel.Becker, mark.fasheh, hch, akpm, linux-fsdevel, wim.coekaerts > > It's pretty easy to create. I had a patch: > > > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110099238515110&w=2 > > > > to which Greg had a comment: > > > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110114650113580&w=2 > > > > which I fixed, but I can't find it anymore. > > Care to resend your fixed patch? OK, here it is (untested). reiser4 in -mm used to create /sys/fs for itself, but it doesn't seem to do so anymore, so it should be safe. --- This patch adds an empty /sys/fs, which filesystems can use. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Index: linux/include/linux/fs.h =================================================================== --- linux.orig/include/linux/fs.h 2005-08-31 12:51:30.000000000 +0200 +++ linux/include/linux/fs.h 2005-08-31 12:58:12.000000000 +0200 @@ -1251,6 +1251,9 @@ extern long do_mount(char *, char *, cha extern int vfs_statfs(struct super_block *, struct kstatfs *); +/* /sys/fs */ +extern struct subsystem fs_subsys; + #define FLOCK_VERIFY_READ 1 #define FLOCK_VERIFY_WRITE 2 Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2005-08-19 14:58:53.000000000 +0200 +++ linux/fs/namespace.c 2005-08-31 12:55:48.000000000 +0200 @@ -43,6 +43,9 @@ static struct list_head *mount_hashtable static int hash_mask, hash_bits; static kmem_cache_t *mnt_cache; +/* /sys/fs */ +decl_subsys(fs, NULL, NULL); + static inline unsigned long hash(struct vfsmount *mnt, struct dentry *dentry) { unsigned long tmp = ((unsigned long) mnt / L1_CACHE_BYTES); @@ -1453,6 +1456,7 @@ void __init mnt_init(unsigned long mempa i--; } while (i); sysfs_init(); + subsystem_register(&fs_subsys); init_rootfs(); init_mount_tree(); } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-25 9:58 ` Christoph Hellwig 2005-08-25 17:45 ` Mark Fasheh @ 2005-08-25 18:45 ` Zach Brown 2005-08-25 20:23 ` Christoph Hellwig 1 sibling, 1 reply; 12+ messages in thread From: Zach Brown @ 2005-08-25 18:45 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Joel Becker, Andrew Morton, mark.fasheh, linux-fsdevel > Currently we don't support buffered aio on any filesystem in > mainline, so adding crufty code to mainline sounds like a bad idea. > Zab agreed on that and wants to remove it as much as it gets. Yeah, we aim to simplify this code. For the record, it wasn't buffered aio that was the problem. There were two naughty moving parts: First, trying not to block in the dlm when issuing aio ops and tracking state to restart after dlm ops returned eiocbqueued. This was just overly aggressive. This can behave like block mapping lookups in that it rarely blocks. Most aio that people care about (direct io writes to already allocated regions) will simply be acquiring and releasing shared-read locks around each op -- trivial local operations. Second, trying to hold dlm locks around the entirety of aio ops. This led to the mess of trying to tear down locks in the iocb dtor method. (which can then race with unmount, aio does __fput on the filp, dropping the vfsmount ref, before calling dtor.. bleh). We can get around this by unlocking after performing the block mapping lookups and issueing the io and introducing a cluster DLM lock which behaves like i_alloc_sem. So, how about a patch that lets the fs provide a callback to acquire/release i_alloc_sem at the current sites (dio, notify_change) that work with it? Most file systems wouldn't provide a callback and the code would just use the sem as usual, but clustered guys could use dlm locking. - z ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] 2005-08-25 18:45 ` Zach Brown @ 2005-08-25 20:23 ` Christoph Hellwig 0 siblings, 0 replies; 12+ messages in thread From: Christoph Hellwig @ 2005-08-25 20:23 UTC (permalink / raw) To: Zach Brown Cc: Christoph Hellwig, Joel Becker, Andrew Morton, mark.fasheh, linux-fsdevel On Thu, Aug 25, 2005 at 11:45:14AM -0700, Zach Brown wrote: > Yeah, we aim to simplify this code. For the record, it wasn't buffered > aio that was the problem. There were two naughty moving parts: > > First, trying not to block in the dlm when issuing aio ops and tracking > state to restart after dlm ops returned eiocbqueued. This was just > overly aggressive. This can behave like block mapping lookups in that > it rarely blocks. Most aio that people care about (direct io writes to > already allocated regions) will simply be acquiring and releasing > shared-read locks around each op -- trivial local operations. > > Second, trying to hold dlm locks around the entirety of aio ops. This > led to the mess of trying to tear down locks in the iocb dtor method. > (which can then race with unmount, aio does __fput on the filp, dropping > the vfsmount ref, before calling dtor.. bleh). We can get around this > by unlocking after performing the block mapping lookups and issueing the > io and introducing a cluster DLM lock which behaves like i_alloc_sem. You might want to look at XFS as a model for this. While it's not clustered it has it's own r/w semaphore to protect block allocations. It's not using the i_alloc_sem at all but some 'clever' behaviour with downgrading the lock after the block allocations are done. > So, how about a patch that lets the fs provide a callback to > acquire/release i_alloc_sem at the current sites (dio, notify_change) > that work with it? Most file systems wouldn't provide a callback and > the code would just use the sem as usual, but clustered guys could use > dlm locking. If we're going down that route I'd say provide the callback for filesystems that actually need locking only, but there must be a better way to do that. Note that in any case you're doing lots of work for the buffere path aswell in aio.c that should be nessecary with a bit of refactoring. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-08-31 11:11 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20050822213220.GH19387@insight.us.oracle.com>
[not found] ` <20050822144521.24494329.akpm@osdl.org>
[not found] ` <20050822215049.GI19387@insight.us.oracle.com>
[not found] ` <20050822150505.7978136d.akpm@osdl.org>
2005-08-24 7:18 ` [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] Christoph Hellwig
2005-08-24 20:33 ` Joel Becker
2005-08-25 9:58 ` Christoph Hellwig
2005-08-25 17:45 ` Mark Fasheh
2005-08-28 22:48 ` Greg KH
2005-08-29 17:41 ` Joel Becker
2005-08-29 19:29 ` Miklos Szeredi
2005-08-31 6:14 ` Greg KH
2005-08-31 8:24 ` Joel Becker
2005-08-31 11:11 ` Miklos Szeredi
2005-08-25 18:45 ` Zach Brown
2005-08-25 20:23 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).