* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() [not found] ` <1229095751-23984-1-git-send-email-louis.rilling@kerlabs.com> @ 2008-12-17 21:40 ` Andrew Morton 2008-12-17 22:03 ` Joel Becker [not found] ` <1229585208.9487.112.camel@twins> 0 siblings, 2 replies; 8+ messages in thread From: Andrew Morton @ 2008-12-17 21:40 UTC (permalink / raw) To: cluster-devel.redhat.com On Fri, 12 Dec 2008 16:29:11 +0100 Louis Rilling <louis.rilling@kerlabs.com> wrote: > When attaching default groups (subdirs) of a new group (in mkdir() or > in configfs_register()), configfs recursively takes inode's mutexes > along the path from the parent of the new group to the default > subdirs. This is needed to ensure that the VFS will not race with > operations on these sub-dirs. This is safe for the following reasons: > > - the VFS allows one to lock first an inode and second one of its > children (The lock subclasses for this pattern are respectively > I_MUTEX_PARENT and I_MUTEX_CHILD); > - from this rule any inode path can be recursively locked in > descending order as long as it stays under a single mountpoint and > does not follow symlinks. > > Unfortunately lockdep does not know (yet?) how to handle such > recursion. > > I've tried to use Peter Zijlstra's lock_set_subclass() helper to > upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know > that we might recursively lock some of their descendant, but this > usage does not seem to fit the purpose of lock_set_subclass() because > it leads to several i_mutex locked with subclass I_MUTEX_PARENT by > the same task. > > >From inside configfs it is not possible to serialize those recursive > locking with a top-level one, because mkdir() and rmdir() are already > called with inodes locked by the VFS. So using some > mutex_lock_nest_lock() is not an option. > > I am proposing two solutions: > 1) one that wraps recursive mutex_lock()s with > lockdep_off()/lockdep_on(). > 2) (as suggested earlier by Peter Zijlstra) one that puts the > i_mutexes recursively locked in different classes based on their > depth from the top-level config_group created. This > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > nesting of configfs default groups whenever lockdep is activated > but this limit looks reasonably high. Unfortunately, this alos > isolates VFS operations on configfs default groups from the others > and thus lowers the chances to detect locking issues. > > This patch implements solution 1). > > Solution 2) looks better from lockdep's point of view, but fails with > configfs_depend_item(). This needs to rework the locking > scheme of configfs_depend_item() by removing the variable lock recursion > depth, and I think that it's doable thanks to the configfs_dirent_lock. > For now, let's stick to solution 1). > > Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> > --- > fs/configfs/dir.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 59 insertions(+), 0 deletions(-) > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > index 8e93341..9c23583 100644 > --- a/fs/configfs/dir.c > +++ b/fs/configfs/dir.c > @@ -553,12 +553,24 @@ static void detach_groups(struct config_group *group) > > child = sd->s_dentry; > > + /* > + * Note: we hide this from lockdep since we have no way > + * to teach lockdep about recursive > + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path > + * in an inode tree, which are valid as soon as > + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a > + * parent inode to one of its children. > + */ > + lockdep_off(); > mutex_lock(&child->d_inode->i_mutex); > + lockdep_on(); > > [etc] > Oh dear, what an unpleasant patch. Peter, can this be saved? ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-17 21:40 ` [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Andrew Morton @ 2008-12-17 22:03 ` Joel Becker 2008-12-17 22:09 ` Andrew Morton [not found] ` <1229585208.9487.112.camel@twins> 1 sibling, 1 reply; 8+ messages in thread From: Joel Becker @ 2008-12-17 22:03 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, Dec 17, 2008 at 01:40:20PM -0800, Andrew Morton wrote: > On Fri, 12 Dec 2008 16:29:11 +0100 > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > When attaching default groups (subdirs) of a new group (in mkdir() or > > in configfs_register()), configfs recursively takes inode's mutexes > > along the path from the parent of the new group to the default > > subdirs. This is needed to ensure that the VFS will not race with > > operations on these sub-dirs. This is safe for the following reasons: > > > > - the VFS allows one to lock first an inode and second one of its > > children (The lock subclasses for this pattern are respectively > > I_MUTEX_PARENT and I_MUTEX_CHILD); > > - from this rule any inode path can be recursively locked in > > descending order as long as it stays under a single mountpoint and > > does not follow symlinks. > > > > Unfortunately lockdep does not know (yet?) how to handle such > > recursion. > > > > I've tried to use Peter Zijlstra's lock_set_subclass() helper to > > upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know > > that we might recursively lock some of their descendant, but this > > usage does not seem to fit the purpose of lock_set_subclass() because > > it leads to several i_mutex locked with subclass I_MUTEX_PARENT by > > the same task. > > > > >From inside configfs it is not possible to serialize those recursive > > locking with a top-level one, because mkdir() and rmdir() are already > > called with inodes locked by the VFS. So using some > > mutex_lock_nest_lock() is not an option. > > > > I am proposing two solutions: > > 1) one that wraps recursive mutex_lock()s with > > lockdep_off()/lockdep_on(). > > 2) (as suggested earlier by Peter Zijlstra) one that puts the > > i_mutexes recursively locked in different classes based on their > > depth from the top-level config_group created. This > > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > > nesting of configfs default groups whenever lockdep is activated > > but this limit looks reasonably high. Unfortunately, this alos > > isolates VFS operations on configfs default groups from the others > > and thus lowers the chances to detect locking issues. > > > > This patch implements solution 1). > > > > Solution 2) looks better from lockdep's point of view, but fails with > > configfs_depend_item(). This needs to rework the locking > > scheme of configfs_depend_item() by removing the variable lock recursion > > depth, and I think that it's doable thanks to the configfs_dirent_lock. > > For now, let's stick to solution 1). > > > > Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> > > --- > > fs/configfs/dir.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 59 insertions(+), 0 deletions(-) > > > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > > index 8e93341..9c23583 100644 > > --- a/fs/configfs/dir.c > > +++ b/fs/configfs/dir.c > > @@ -553,12 +553,24 @@ static void detach_groups(struct config_group *group) > > > > child = sd->s_dentry; > > > > + /* > > + * Note: we hide this from lockdep since we have no way > > + * to teach lockdep about recursive > > + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path > > + * in an inode tree, which are valid as soon as > > + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a > > + * parent inode to one of its children. > > + */ > > + lockdep_off(); > > mutex_lock(&child->d_inode->i_mutex); > > + lockdep_on(); > > > > [etc] > > > > Oh dear, what an unpleasant patch. > > Peter, can this be saved? I'd love to see it work within lockdep, but it seems rather hard, so that's why I recommended Louis cook up this version. I see you picked it up in -mm. Do you want me to push it through ocfs2.git? Joel -- Life's Little Instruction Book #157 "Take time to smell the roses." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-17 22:03 ` Joel Becker @ 2008-12-17 22:09 ` Andrew Morton 0 siblings, 0 replies; 8+ messages in thread From: Andrew Morton @ 2008-12-17 22:09 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, 17 Dec 2008 14:03:10 -0800 Joel Becker <Joel.Becker@oracle.com> wrote: > > > > > > [etc] > > > > > > > Oh dear, what an unpleasant patch. > > > > Peter, can this be saved? > > I'd love to see it work within lockdep, but it seems rather > hard, so that's why I recommended Louis cook up this version. I see you > picked it up in -mm. Do you want me to push it through ocfs2.git? Please do. If Peter refuses to save us. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1229585208.9487.112.camel@twins>]
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() [not found] ` <1229585208.9487.112.camel@twins> @ 2008-12-18 9:27 ` Joel Becker 2008-12-18 11:26 ` Steven Whitehouse [not found] ` <1229601399.9487.218.camel@twins> 0 siblings, 2 replies; 8+ messages in thread From: Joel Becker @ 2008-12-18 9:27 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Dec 18, 2008 at 08:26:48AM +0100, Peter Zijlstra wrote: > On Wed, 2008-12-17 at 13:40 -0800, Andrew Morton wrote: > > On Fri, 12 Dec 2008 16:29:11 +0100 > > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > >From inside configfs it is not possible to serialize those recursive > > > locking with a top-level one, because mkdir() and rmdir() are already > > > called with inodes locked by the VFS. So using some > > > mutex_lock_nest_lock() is not an option. <snip> > > > > Oh dear, what an unpleasant patch. > > > > Peter, can this be saved? > > I'm still not sure why configfs is the only virtual filesystem that > suffers this - something in its design is weird (and not so wonderfull). > > Also, while I usually applaud fine grained locking, is configfs really > in need of that?, I mean, its not like its meant to create and remove > directories all day every day at breakneck speed, right? AFAIK you just > stomp some data in to configure some kernel stuff and then let it sit > for the duration of whatever that kernel thing does while it does it. It's not about breakneck speed, it's about living in the VFS. But I think you get that later. > See I'm not even clear on what configfs is.. and why its better than > sysfs for example.. - /me goes read configfs.txt > > Right, so basically we avoid syscalls by making vfs ops do stuff.. > lovely - still not seeing it though - regular filesystems seems to cope > just fine and they get to create arbitrary tree structures too. It's about the default_groups and how they build up and tear down small bits of tree. A simple creation of a config_item, a mkdir(2), is a normal VFS lock set and doesn't make lockdep unhappy. But if the new config_item has a default_group or two, they need locking too. Not so much on mkdir(2), but on rmdir(2). > Yes lockdep has 2 major weaknesses - 1) it doesn't support arbitrary > lock depths (and I've tried, twice, to fix that, but its a _hard_ > problem) and 2) it can't deal with arbitrary recursion. I know it's hard, or I'd have sent you patches :-) In fact, Louis tried to use the subclass bits to make this work to a depth of N (where N was probably deep enough in practice). However, this creates subclasses that don't get seen by the regular VFS locking - and the big deal here is making sure configfs's use of i_mutex meshes with the VFS. That is, his code made the warnings go away, but removed much of lockdep's ability to see when we got the locking wrong. > The thing is, in practise it turns out that reworking code to not run > into these issues often makes the code better - if only for the fact > that taking locks is expensive and doing less is better, and holding > locks stifles concurrency, so holding less it better (yes, I said > _often_, there likely are counter cases but I don't believe configfs is > one of them). This isn't about concurrency or speed. This is about safety while configfs is attaching or (especially) detaching config_items from the filesystem view it presents. When the VFS travels down a path, it unlocks the trailing directory. We can't do that when tearing down default groups, because we need to lock that small hunk and tear it out atomically. > Anyway - I'm against just turning lockdep off, that will make folks > complacent and let the stuff rot to bits inside - and I for one will > refuse to run anything using it (but since that only seems to be > dlm/ocfs and I'm of the believe that centralized cluster stuff sucks > rocks anyway that won't be a problem). Oh, be nice :-) You are absolutely right that turning off lockdep leaves the possibility of complacency and bitrot. That's precisely why I didn't like Louis' subclass solution - again, bitrot might go unnoticed. Now, I know that I will be paying attention to the locking and going over it with a fine-toothed comb. But I'd much rather have an actual working lockdep analysis. Whether that means we find a way for lockdep to describe what's happening here, or we find another way to keep folks out of the tree we're removing, I don't care. Joel -- Life's Little Instruction Book #109 "Know how to drive a stick shift." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 9:27 ` Joel Becker @ 2008-12-18 11:26 ` Steven Whitehouse [not found] ` <1229601399.9487.218.camel@twins> 1 sibling, 0 replies; 8+ messages in thread From: Steven Whitehouse @ 2008-12-18 11:26 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, I have to say that when I brought this subject up, I didn't realise quite how tricky this was going to be, however... On Thu, 2008-12-18 at 01:27 -0800, Joel Becker wrote: > On Thu, Dec 18, 2008 at 08:26:48AM +0100, Peter Zijlstra wrote: > > On Wed, 2008-12-17 at 13:40 -0800, Andrew Morton wrote: > > > On Fri, 12 Dec 2008 16:29:11 +0100 > > > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > > >From inside configfs it is not possible to serialize those recursive > > > > locking with a top-level one, because mkdir() and rmdir() are already > > > > called with inodes locked by the VFS. So using some > > > > mutex_lock_nest_lock() is not an option. > > <snip> > <more snipping> > > Right, so basically we avoid syscalls by making vfs ops do stuff.. > > lovely - still not seeing it though - regular filesystems seems to cope > > just fine and they get to create arbitrary tree structures too. > > It's about the default_groups and how they build up and tear > down small bits of tree. > A simple creation of a config_item, a mkdir(2), is a normal VFS > lock set and doesn't make lockdep unhappy. But if the new config_item > has a default_group or two, they need locking too. Not so much on > mkdir(2), but on rmdir(2). > So if I've understood this correctly, the dentries created upon mkdir live until such time as they are removed at some later date, presumably with rmdir? When creating the tree, would it be possible to build it starting from the bottom and working towards the point of attachment and then to not actually attach it until the last moment? That way it would not be visible from userland until it was linked into the existing dir and that solves the locking issue for mkdir I think. As you say, rmdir seems the harder problem, but again, is it possible to separate the unlink operation from the destruction of the tree by keeping the tree, after its been unlinked, until the last userland reference has gone away? At least I assume that is why the locking is there. I may be a bit off the mark, but it seems like this is quite similar to how the VFS does umount, so maybe there are some hints in that code which may help us solve this issue? Steve. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1229601399.9487.218.camel@twins>]
[parent not found: <1229603308.9487.227.camel@twins>]
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() [not found] ` <1229603308.9487.227.camel@twins> @ 2008-12-18 22:58 ` Joel Becker [not found] ` <1232973009.4863.76.camel@laptop> 0 siblings, 1 reply; 8+ messages in thread From: Joel Becker @ 2008-12-18 22:58 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Dec 18, 2008 at 01:28:28PM +0100, Peter Zijlstra wrote: > In fact, both (configfs) mkdir and rmdir seem to synchronize on > su_mutex.. > > mkdir B/C/bar > > C.i_mutex > su_mutex > > vs > > rmdir foo > > parent(foo).i_mutex > foo.i_mutex > su_mutex > > > once holding the rmdir su_mutex you can check foo's user-content, since > any mkdir will be blocked. All you have to do is then re-validate in > mkdir's su_mutex that !IS_DEADDIR(C). We explicitly do not take any i_mutex locks after taking su_mutex. That's an ABBA risk. su_mutex protects the hierarchy of config_items. i_mutex protects the vfs view thereof. If you look in mkdir, we take su_mutex, get a new item from the client subsystem, then drop su_mutex. After that, we go about building our filesystem structure, using i_mutex where appropriate. More importantly is rmdir(2), where we use i_mutex in configfs_detach_group(), but are not holding su_sem. Only when configfs_detach_group() has successfully returned and we have torn down the filesystem structure do we take su_mutex and tear down the config_item structure. In fact, we're part of the way there. Check out that USET_DROPPING flag we set in detach_prep() while scanning for user objects. That flags us racing mkdir(2). When we are done with detach_prep(), we know that mkdir(2) calls racing behind us will do nothing until we safely lock them out with the locking in detach_group(). All mkdir(2) calls will have exited by the time we get the mutex, and no new mkdir(2) call can start because we have the mutex. Now look in detach_groups(). We drop the groups children before marking them DEAD. Louis' plan, I think, is to perhaps mark a group DEAD, disconnect it from the vfs, and then operate on its children. In this fashion, perhaps we can unlock the trailing lock like a normal VFS operation. This will require some serious auditing, however, because now vfs functions can get into the vfs objects behind us. And more vfs changes affect us. Whereas the current locking relies on the vfs's parent->child lock ordering only, something that isn't likely to be changed. Joel -- "You cannot bring about prosperity by discouraging thrift. You cannot strengthen the weak by weakening the strong. You cannot help the wage earner by pulling down the wage payer. You cannot further the brotherhood of man by encouraging class hatred. You cannot help the poor by destroying the rich. You cannot build character and courage by taking away a man's initiative and independence. You cannot help men permanently by doing for them what they could and should do for themselves." - Abraham Lincoln Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1232973009.4863.76.camel@laptop>]
[parent not found: <20090126132453.GD7532@hawkmoon.kerlabs.com>]
[parent not found: <1232977283.4863.79.camel@laptop>]
[parent not found: <20090126140032.GE7532@hawkmoon.kerlabs.com>]
[parent not found: <1232979562.4863.101.camel@laptop>]
[parent not found: <20090126145536.GG7532@hawkmoon.kerlabs.com>]
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() [not found] ` <20090126145536.GG7532@hawkmoon.kerlabs.com> @ 2009-01-28 3:05 ` Joel Becker 0 siblings, 0 replies; 8+ messages in thread From: Joel Becker @ 2009-01-28 3:05 UTC (permalink / raw) To: cluster-devel.redhat.com Thank you both for keeping up on this. I owe Louis some review that I'm going to try to get to. On Mon, Jan 26, 2009 at 03:55:36PM +0100, Louis Rilling wrote: > On 26/01/09 15:19 +0100, Peter Zijlstra wrote: > > On Mon, 2009-01-26 at 15:00 +0100, Louis Rilling wrote: > > > > > > Its not a locking correctness thing, but simply not being able to do it > > > > from the vfs calls because those assume locks held? > > > > > > > > Can't you simply punt the work to a worklet once you've created/removed > > > > the non-default group, which can be done from within the vfs callback ? > > > > > > I'm not sure to understand your suggestion. Is this: > > > 1) for mkdir(), create the non-default group, but without its default groups, > > > and defer their creation to a worker which won't have constraints on locks held > > > by any caller; > > > 2) for rmdir(), unlink the non-default group, but without unlinking its default > > > groups, and defer the recursive work to a lock-free context? > > > > > > For mkdir(), this may work. Maybe a bit confusing for userspace, since mkdir(A) > > > returns as soon as A is created, but A may be populated later and userspace may > > > rely on A being populated as soon as it is created (current behavior). As a > > > configfs user, this makes my life harder... > > > > Right, so that is the whole crux of the matter? The "appearance" of the entire default group hierarchy should be atomic. When mkdir(2) returns, it is all there. This does make our lives a little difficult inside configfs, but it makes everyone else's lives much easier. > Probably not. I'm not the maintainer of configfs, but I guess that Joel is a bit > reluctant to deeply rework parts of something that actually works (conflicts > with lockdep excepted). That is true - it works, and safely, and the lockdep conflict is the only real known issue. I'm not wary of changing it, I'm only wary of breaking it. That is, I want to understand what is being changed and be sure that we're keeping the correctness we have so far. I don't want to change it and "hope" we got it right, you know? This makes me cautious, but don't think I'm against change. As I stated, having lockdep work for us is a good thing. More replies to the other mails coming. Joel -- You can use a screwdriver to screw in screws or to clean your ears, however, the latter needs real skill, determination and a lack of fear of injuring yourself. It is much the same with JavaScript. - Chris Heilmann Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() [not found] ` <1232973009.4863.76.camel@laptop> [not found] ` <20090126132453.GD7532@hawkmoon.kerlabs.com> @ 2009-01-28 3:41 ` Joel Becker 1 sibling, 0 replies; 8+ messages in thread From: Joel Becker @ 2009-01-28 3:41 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, Jan 26, 2009 at 01:30:09PM +0100, Peter Zijlstra wrote: > I don't think I was suggesting that. All you need is to serialize any > mkdir/creat against the rmdir of the youngest non-default group, and you > can do that by holding su_mutex. > > In rmdir, you already own all the i_mutex instances you need to uncouple > the whole tree, all you need to do is validate that its indeed empty -- > you don't need i_mutex's for that, because you're holding su_mutex, and > any concurrent mkdir/creat will be blocking on that. > > If you find it empty, just mark everybody DEAD, drop su_mutex and > decouple. All concurrent mkdir/creat thingies that were blocking will > now bail because their parent is found DEAD. Right, that's what I was talking about when I said: > > Now look in detach_groups(). We drop the groups children before > > marking them DEAD. Louis' plan, I think, is to perhaps mark a group > > DEAD, disconnect it from the vfs, and then operate on its children. In > > this fashion, perhaps we can unlock the trailing lock like a normal VFS > > operation. > > This will require some serious auditing, however, because now > > vfs functions can get into the vfs objects behind us. And more vfs > > changes affect us. Whereas the current locking relies on the vfs's > > parent->child lock ordering only, something that isn't likely to be > > changed. That is, the vfs has already walked past this directory, dropped i_mutex, and is in a child default group holding its i_mutex. It wants to mkdir(2) down there. You're saying that, if mkdir(2) holds su_mutex higher up, it can check S_DEAD and compare with us, and that's exactly the scheme I mentioned in the first of the quoted paragraphs (Louis proposed it a while back). Thus, though we don't hold i_mutex on that child, it will eventually either a) have gotten su_mutex first, and will cause rmdir to ENOTEMPTY or b) have gotten su_mutex second, will see S_DEAD, and return -ENOENT. As far as preventing mkdir(2), I don't see why it wouldn't work. The issue is in that second quoted paragraph. We know that the VFS can look up our children if we're not holding i_mutex. In fact, cached_lookup() can find them without i_mutex. Now, we know that mkdir(2) and rmdir(2) will block at su_mutex. But what about all other file operations, both on the child directories *and* the attribute files? For attribute files, we prevent access at creation time with a flag. We can trust the flag because we hold i_mutex. This might hold anyway because we're holding that toplevel i_mutex. At teardown time, though, we know they can't be found because of i_mutex. Now we don't have that protection for processes that are farther down the tree. But the bigger issue is just the plain regular operations on our directories. An example is ->readdir(). We currently lock out ->readdir() during rmdir(2) with our holding of i_mutex. However, if we are not holding i_mutex, vfs_readdir() can call into our ->readdir() right as we're tearing things down. We may not have gotten to S_DEAD yet in the rmdir(2) path, and the two will race. We can't take su_mutex in ->readdir(), because that sort of solution effectively says "we have to take su_mutex for all operations", and we end up serializing all operations on a subsustem. Now, I can think of a way to make ->readdir() safe without su_mutex. But what other operation is next? How do I know I have them all? How do I notice when someone adds a new operation or code path to the generic code that I have to protect against? With i_mutex, I *know* that everyone has agreed that is the gatekeeper. Without it... *That's* the big worry. That's what I'm worried about being sure of. I'd love to hear a solution that we know will work, or at least move towards one. Joel PS: And we haven't even talked about configfs_depend_item() yet. -- Life's Little Instruction Book #252 "Take good care of those you love." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-01-28 3:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20081212100615.GD19128@hawkmoon.kerlabs.com>
[not found] ` <1229095751-23984-1-git-send-email-louis.rilling@kerlabs.com>
2008-12-17 21:40 ` [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Andrew Morton
2008-12-17 22:03 ` Joel Becker
2008-12-17 22:09 ` Andrew Morton
[not found] ` <1229585208.9487.112.camel@twins>
2008-12-18 9:27 ` Joel Becker
2008-12-18 11:26 ` Steven Whitehouse
[not found] ` <1229601399.9487.218.camel@twins>
[not found] ` <1229603308.9487.227.camel@twins>
2008-12-18 22:58 ` Joel Becker
[not found] ` <1232973009.4863.76.camel@laptop>
[not found] ` <20090126132453.GD7532@hawkmoon.kerlabs.com>
[not found] ` <1232977283.4863.79.camel@laptop>
[not found] ` <20090126140032.GE7532@hawkmoon.kerlabs.com>
[not found] ` <1232979562.4863.101.camel@laptop>
[not found] ` <20090126145536.GG7532@hawkmoon.kerlabs.com>
2009-01-28 3:05 ` Joel Becker
2009-01-28 3:41 ` Joel Becker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).