* Is lockref_get_not_zero() really correct in dget_parent() @ 2014-08-05 3:17 NeilBrown 2014-08-05 4:07 ` Linus Torvalds 0 siblings, 1 reply; 7+ messages in thread From: NeilBrown @ 2014-08-05 3:17 UTC (permalink / raw) To: Al Viro, Linus Torvalds, linux-fsdevel, lkml [-- Attachment #1: Type: text/plain, Size: 1249 bytes --] Hi, I've been looking at last year's change to dentry refcounting which sets the refcount to -128 (mark_dead()) when the dentry is gone. As this is an "unsigned long" and there are several places where d_lockref.count is compared e.g. "> 1", I start feeling uncomfortable, as "-128" is greater than "1". Most of them look safe because there is locking in place and d_lockref.count will never be seen as "-128" unless you get the reference with only rcu_read_lock(). That brings me to dget_parent(). It only has rcu_read_lock() protection, and yet uses lockref_get_not_zero(). This doesn't seem safe. Is there a reason that it is safe that I'm not seeing? Or is the following needed? Thanks, NeilBrown Signed-off-by: NeilBrown <neilb@suse.de> diff --git a/fs/dcache.c b/fs/dcache.c index 06f65857a855..c48ce95a38f2 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -699,7 +699,7 @@ struct dentry *dget_parent(struct dentry *dentry) */ rcu_read_lock(); ret = ACCESS_ONCE(dentry->d_parent); - gotref = lockref_get_not_zero(&ret->d_lockref); + gotref = lockref_get_not_dead(&ret->d_lockref); rcu_read_unlock(); if (likely(gotref)) { if (likely(ret == ACCESS_ONCE(dentry->d_parent))) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Is lockref_get_not_zero() really correct in dget_parent() 2014-08-05 3:17 Is lockref_get_not_zero() really correct in dget_parent() NeilBrown @ 2014-08-05 4:07 ` Linus Torvalds 2014-08-05 4:54 ` Steven Noonan 2014-08-05 5:14 ` Al Viro 0 siblings, 2 replies; 7+ messages in thread From: Linus Torvalds @ 2014-08-05 4:07 UTC (permalink / raw) To: NeilBrown; +Cc: Al Viro, linux-fsdevel, lkml On Mon, Aug 4, 2014 at 8:17 PM, NeilBrown <neilb@suse.de> wrote: > > I've been looking at last year's change to dentry refcounting which sets the > refcount to -128 (mark_dead()) when the dentry is gone. > > As this is an "unsigned long" and there are several places where > d_lockref.count is compared e.g. "> 1", I start feeling uncomfortable, as > "-128" is greater than "1". Anybody who checks the lockref count without holding the lock is pretty much buggy by definition. And if you hold the lock, you had better not ever see a negative (== large positive) number, because that would be all kinds of buggy too. So I don't *think* that people who compare with "> 1" kind of things should be problematic. I wouldn't necessarily disagree with the notion of making a lockref be a signed entity, though. It started out unsigned, but it started out without that dead state too, so that unsigned thing can be considered a historical artifact rather than any real design decision. Anyway, I think my argument is that anybody who actually looks at d_count() and might see that magic dead value is so fundamentally broken in other ways that I wouldn't worry too much about *that* part. But your "lockref_get_not_zero()" thing is a different thing: > That brings me to dget_parent(). It only has rcu_read_lock() protection, and > yet uses lockref_get_not_zero(). This doesn't seem safe. Yes, agreed, it's ugly and wrong, and smells bad. But I think it happens to be safe (because the re-checking of d_parent will fail if a rename and dput could have triggered it, and even the extraneous "dput()" is actually safe, because it won't cause the value to become zero, so nothing bad happens. But it *is* kind of subtle, and I do agree that it's *needlessly* so. So it might be a good idea to get rid of the "not zero" version entirely, and make the check be about being *active* (ie not zero, and not dead). The only user of lockref_get_not_zero() is that dget_parent() thing, so that should be easy. So renaming it to "lockref_get_active()", and changing the "not zero" test to check for "positive" and change the rtype of "count" to be signed, all sound like good things to me. But I don't actually think it's an active bug, it's just an "active horribly ugly and subtly working code". I guess in theory if you can get lots of CPU's triggering the race at the same time, the magic negative number could become zero and positive, but at that point I don't think we're really talking reality any more. Can somebody pick holes in that? Does somebody want to send in the cleanup patch? Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Is lockref_get_not_zero() really correct in dget_parent() 2014-08-05 4:07 ` Linus Torvalds @ 2014-08-05 4:54 ` Steven Noonan 2014-08-05 4:56 ` Steven Noonan 2014-08-05 5:14 ` Al Viro 1 sibling, 1 reply; 7+ messages in thread From: Steven Noonan @ 2014-08-05 4:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: NeilBrown, Al Viro, linux-fsdevel, lkml On Mon, Aug 04, 2014 at 09:07:32PM -0700, Linus Torvalds wrote: > On Mon, Aug 4, 2014 at 8:17 PM, NeilBrown <neilb@suse.de> wrote: > > > > I've been looking at last year's change to dentry refcounting which sets the > > refcount to -128 (mark_dead()) when the dentry is gone. > > > > As this is an "unsigned long" and there are several places where > > d_lockref.count is compared e.g. "> 1", I start feeling uncomfortable, as > > "-128" is greater than "1". > > Anybody who checks the lockref count without holding the lock is > pretty much buggy by definition. And if you hold the lock, you had > better not ever see a negative (== large positive) number, because > that would be all kinds of buggy too. > > So I don't *think* that people who compare with "> 1" kind of things > should be problematic. I wouldn't necessarily disagree with the notion > of making a lockref be a signed entity, though. It started out > unsigned, but it started out without that dead state too, so that > unsigned thing can be considered a historical artifact rather than any > real design decision. > > Anyway, I think my argument is that anybody who actually looks at > d_count() and might see that magic dead value is so fundamentally > broken in other ways that I wouldn't worry too much about *that* part. > > But your "lockref_get_not_zero()" thing is a different thing: > > > That brings me to dget_parent(). It only has rcu_read_lock() protection, and > > yet uses lockref_get_not_zero(). This doesn't seem safe. > > Yes, agreed, it's ugly and wrong, and smells bad. > > But I think it happens to be safe (because the re-checking of d_parent > will fail if a rename and dput could have triggered it, and even the > extraneous "dput()" is actually safe, because it won't cause the value > to become zero, so nothing bad happens. But it *is* kind of subtle, > and I do agree that it's *needlessly* so. > > So it might be a good idea to get rid of the "not zero" version > entirely, and make the check be about being *active* (ie not zero, and > not dead). > > The only user of lockref_get_not_zero() is that dget_parent() thing, > so that should be easy. > > So renaming it to "lockref_get_active()", and changing the "not zero" > test to check for "positive" and change the rtype of "count" to be > signed, all sound like good things to me. > > But I don't actually think it's an active bug, it's just an "active > horribly ugly and subtly working code". I guess in theory if you can > get lots of CPU's triggering the race at the same time, the magic > negative number could become zero and positive, but at that point I > don't think we're really talking reality any more. > > Can somebody pick holes in that? Does somebody want to send in the > cleanup patch? How does this look? diff --git a/fs/dcache.c b/fs/dcache.c index e99c6f5..1e7dc31 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -479,7 +479,7 @@ static void __dentry_kill(struct dentry *dentry) * dentry_iput drops the locks, at which point nobody (except * transient RCU lookups) can reach this dentry. */ - BUG_ON((int)dentry->d_lockref.count > 0); + BUG_ON(dentry->d_lockref.count > 0); this_cpu_dec(nr_dentry); if (dentry->d_op && dentry->d_op->d_release) dentry->d_op->d_release(dentry); @@ -532,7 +532,7 @@ static inline struct dentry *lock_parent(struct dentry *dentry) struct dentry *parent = dentry->d_parent; if (IS_ROOT(dentry)) return NULL; - if (unlikely((int)dentry->d_lockref.count < 0)) + if (unlikely(dentry->d_lockref.count < 0)) return NULL; if (likely(spin_trylock(&parent->d_lock))) return parent; @@ -699,7 +699,7 @@ struct dentry *dget_parent(struct dentry *dentry) */ rcu_read_lock(); ret = ACCESS_ONCE(dentry->d_parent); - gotref = lockref_get_not_zero(&ret->d_lockref); + gotref = lockref_get_active(&ret->d_lockref); rcu_read_unlock(); if (likely(gotref)) { if (likely(ret == ACCESS_ONCE(dentry->d_parent))) @@ -848,7 +848,7 @@ static void shrink_dentry_list(struct list_head *list) * We found an inuse dentry which was not removed from * the LRU because of laziness during lookup. Do not free it. */ - if ((int)dentry->d_lockref.count > 0) { + if (dentry->d_lockref.count > 0) { spin_unlock(&dentry->d_lock); if (parent) spin_unlock(&parent->d_lock); diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index aec7f73..d492f0e 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1745,7 +1745,7 @@ void gfs2_dump_glock(struct seq_file *seq, const struct gfs2_glock *gl) state2str(gl->gl_demote_state), dtime, atomic_read(&gl->gl_ail_count), atomic_read(&gl->gl_revokes), - (int)gl->gl_lockref.count, gl->gl_hold_time); + gl->gl_lockref.count, gl->gl_hold_time); list_for_each_entry(gh, &gl->gl_holders, gh_list) dump_holder(seq, gh); diff --git a/include/linux/lockref.h b/include/linux/lockref.h index 4bfde0e..1a9827e 100644 --- a/include/linux/lockref.h +++ b/include/linux/lockref.h @@ -28,13 +28,13 @@ struct lockref { #endif struct { spinlock_t lock; - unsigned int count; + int count; }; }; }; extern void lockref_get(struct lockref *); -extern int lockref_get_not_zero(struct lockref *); +extern int lockref_get_active(struct lockref *); extern int lockref_get_or_lock(struct lockref *); extern int lockref_put_or_lock(struct lockref *); diff --git a/lib/lockref.c b/lib/lockref.c index f07a40d..7f30371 100644 --- a/lib/lockref.c +++ b/lib/lockref.c @@ -61,17 +61,18 @@ void lockref_get(struct lockref *lockref) EXPORT_SYMBOL(lockref_get); /** - * lockref_get_not_zero - Increments count unless the count is 0 + * lockref_get_active - Increments count unless the count is 0 or ref is dead * @lockref: pointer to lockref structure - * Return: 1 if count updated successfully or 0 if count was zero + * Return: 1 if count updated successfully or 0 if count was zero or lockref + * was dead */ -int lockref_get_not_zero(struct lockref *lockref) +int lockref_get_active(struct lockref *lockref) { int retval; CMPXCHG_LOOP( new.count++; - if (!old.count) + if (old.count < 1) return 0; , return 1; @@ -79,14 +80,14 @@ int lockref_get_not_zero(struct lockref *lockref) spin_lock(&lockref->lock); retval = 0; - if (lockref->count) { + if (lockref->count >= 1) { lockref->count++; retval = 1; } spin_unlock(&lockref->lock); return retval; } -EXPORT_SYMBOL(lockref_get_not_zero); +EXPORT_SYMBOL(lockref_get_active); /** * lockref_get_or_lock - Increments count unless the count is 0 @@ -159,7 +160,7 @@ int lockref_get_not_dead(struct lockref *lockref) CMPXCHG_LOOP( new.count++; - if ((int)old.count < 0) + if (old.count < 0) return 0; , return 1; @@ -167,7 +168,7 @@ int lockref_get_not_dead(struct lockref *lockref) spin_lock(&lockref->lock); retval = 0; - if ((int) lockref->count >= 0) { + if (lockref->count >= 0) { lockref->count++; retval = 1; } I'm not too happy with the documentation string for lockref_get_active(), but I believe the logic is at least right. There were a few places where 'count' was being casted to a signed integer, and since this change turns 'count' into a signed value anyway, I've stripped out the casts. ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Is lockref_get_not_zero() really correct in dget_parent() 2014-08-05 4:54 ` Steven Noonan @ 2014-08-05 4:56 ` Steven Noonan 2014-08-05 16:53 ` Linus Torvalds 0 siblings, 1 reply; 7+ messages in thread From: Steven Noonan @ 2014-08-05 4:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: NeilBrown, Al Viro, linux-fsdevel, lkml On Mon, Aug 4, 2014 at 9:54 PM, Steven Noonan <steven@uplinklabs.net> wrote: > On Mon, Aug 04, 2014 at 09:07:32PM -0700, Linus Torvalds wrote: >> On Mon, Aug 4, 2014 at 8:17 PM, NeilBrown <neilb@suse.de> wrote: >> > >> > I've been looking at last year's change to dentry refcounting which sets the >> > refcount to -128 (mark_dead()) when the dentry is gone. >> > >> > As this is an "unsigned long" and there are several places where >> > d_lockref.count is compared e.g. "> 1", I start feeling uncomfortable, as >> > "-128" is greater than "1". >> >> Anybody who checks the lockref count without holding the lock is >> pretty much buggy by definition. And if you hold the lock, you had >> better not ever see a negative (== large positive) number, because >> that would be all kinds of buggy too. >> >> So I don't *think* that people who compare with "> 1" kind of things >> should be problematic. I wouldn't necessarily disagree with the notion >> of making a lockref be a signed entity, though. It started out >> unsigned, but it started out without that dead state too, so that >> unsigned thing can be considered a historical artifact rather than any >> real design decision. >> >> Anyway, I think my argument is that anybody who actually looks at >> d_count() and might see that magic dead value is so fundamentally >> broken in other ways that I wouldn't worry too much about *that* part. >> >> But your "lockref_get_not_zero()" thing is a different thing: >> >> > That brings me to dget_parent(). It only has rcu_read_lock() protection, and >> > yet uses lockref_get_not_zero(). This doesn't seem safe. >> >> Yes, agreed, it's ugly and wrong, and smells bad. >> >> But I think it happens to be safe (because the re-checking of d_parent >> will fail if a rename and dput could have triggered it, and even the >> extraneous "dput()" is actually safe, because it won't cause the value >> to become zero, so nothing bad happens. But it *is* kind of subtle, >> and I do agree that it's *needlessly* so. >> >> So it might be a good idea to get rid of the "not zero" version >> entirely, and make the check be about being *active* (ie not zero, and >> not dead). >> >> The only user of lockref_get_not_zero() is that dget_parent() thing, >> so that should be easy. >> >> So renaming it to "lockref_get_active()", and changing the "not zero" >> test to check for "positive" and change the rtype of "count" to be >> signed, all sound like good things to me. >> >> But I don't actually think it's an active bug, it's just an "active >> horribly ugly and subtly working code". I guess in theory if you can >> get lots of CPU's triggering the race at the same time, the magic >> negative number could become zero and positive, but at that point I >> don't think we're really talking reality any more. >> >> Can somebody pick holes in that? Does somebody want to send in the >> cleanup patch? > > How does this look? > > > diff --git a/fs/dcache.c b/fs/dcache.c > index e99c6f5..1e7dc31 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -479,7 +479,7 @@ static void __dentry_kill(struct dentry *dentry) > * dentry_iput drops the locks, at which point nobody (except > * transient RCU lookups) can reach this dentry. > */ > - BUG_ON((int)dentry->d_lockref.count > 0); > + BUG_ON(dentry->d_lockref.count > 0); > this_cpu_dec(nr_dentry); > if (dentry->d_op && dentry->d_op->d_release) > dentry->d_op->d_release(dentry); > @@ -532,7 +532,7 @@ static inline struct dentry *lock_parent(struct dentry *dentry) > struct dentry *parent = dentry->d_parent; > if (IS_ROOT(dentry)) > return NULL; > - if (unlikely((int)dentry->d_lockref.count < 0)) > + if (unlikely(dentry->d_lockref.count < 0)) > return NULL; > if (likely(spin_trylock(&parent->d_lock))) > return parent; > @@ -699,7 +699,7 @@ struct dentry *dget_parent(struct dentry *dentry) > */ > rcu_read_lock(); > ret = ACCESS_ONCE(dentry->d_parent); > - gotref = lockref_get_not_zero(&ret->d_lockref); > + gotref = lockref_get_active(&ret->d_lockref); > rcu_read_unlock(); > if (likely(gotref)) { > if (likely(ret == ACCESS_ONCE(dentry->d_parent))) > @@ -848,7 +848,7 @@ static void shrink_dentry_list(struct list_head *list) > * We found an inuse dentry which was not removed from > * the LRU because of laziness during lookup. Do not free it. > */ > - if ((int)dentry->d_lockref.count > 0) { > + if (dentry->d_lockref.count > 0) { > spin_unlock(&dentry->d_lock); > if (parent) > spin_unlock(&parent->d_lock); > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > index aec7f73..d492f0e 100644 > --- a/fs/gfs2/glock.c > +++ b/fs/gfs2/glock.c > @@ -1745,7 +1745,7 @@ void gfs2_dump_glock(struct seq_file *seq, const struct gfs2_glock *gl) > state2str(gl->gl_demote_state), dtime, > atomic_read(&gl->gl_ail_count), > atomic_read(&gl->gl_revokes), > - (int)gl->gl_lockref.count, gl->gl_hold_time); > + gl->gl_lockref.count, gl->gl_hold_time); > > list_for_each_entry(gh, &gl->gl_holders, gh_list) > dump_holder(seq, gh); Er, grep failed me. The above delta to gfs2 is obviously bogus. > diff --git a/include/linux/lockref.h b/include/linux/lockref.h > index 4bfde0e..1a9827e 100644 > --- a/include/linux/lockref.h > +++ b/include/linux/lockref.h > @@ -28,13 +28,13 @@ struct lockref { > #endif > struct { > spinlock_t lock; > - unsigned int count; > + int count; > }; > }; > }; > > extern void lockref_get(struct lockref *); > -extern int lockref_get_not_zero(struct lockref *); > +extern int lockref_get_active(struct lockref *); > extern int lockref_get_or_lock(struct lockref *); > extern int lockref_put_or_lock(struct lockref *); > > diff --git a/lib/lockref.c b/lib/lockref.c > index f07a40d..7f30371 100644 > --- a/lib/lockref.c > +++ b/lib/lockref.c > @@ -61,17 +61,18 @@ void lockref_get(struct lockref *lockref) > EXPORT_SYMBOL(lockref_get); > > /** > - * lockref_get_not_zero - Increments count unless the count is 0 > + * lockref_get_active - Increments count unless the count is 0 or ref is dead > * @lockref: pointer to lockref structure > - * Return: 1 if count updated successfully or 0 if count was zero > + * Return: 1 if count updated successfully or 0 if count was zero or lockref > + * was dead > */ > -int lockref_get_not_zero(struct lockref *lockref) > +int lockref_get_active(struct lockref *lockref) > { > int retval; > > CMPXCHG_LOOP( > new.count++; > - if (!old.count) > + if (old.count < 1) > return 0; > , > return 1; > @@ -79,14 +80,14 @@ int lockref_get_not_zero(struct lockref *lockref) > > spin_lock(&lockref->lock); > retval = 0; > - if (lockref->count) { > + if (lockref->count >= 1) { > lockref->count++; > retval = 1; > } > spin_unlock(&lockref->lock); > return retval; > } > -EXPORT_SYMBOL(lockref_get_not_zero); > +EXPORT_SYMBOL(lockref_get_active); > > /** > * lockref_get_or_lock - Increments count unless the count is 0 > @@ -159,7 +160,7 @@ int lockref_get_not_dead(struct lockref *lockref) > > CMPXCHG_LOOP( > new.count++; > - if ((int)old.count < 0) > + if (old.count < 0) > return 0; > , > return 1; > @@ -167,7 +168,7 @@ int lockref_get_not_dead(struct lockref *lockref) > > spin_lock(&lockref->lock); > retval = 0; > - if ((int) lockref->count >= 0) { > + if (lockref->count >= 0) { > lockref->count++; > retval = 1; > } > > > I'm not too happy with the documentation string for lockref_get_active(), but > I believe the logic is at least right. There were a few places where 'count' > was being casted to a signed integer, and since this change turns 'count' into > a signed value anyway, I've stripped out the casts. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Is lockref_get_not_zero() really correct in dget_parent() 2014-08-05 4:56 ` Steven Noonan @ 2014-08-05 16:53 ` Linus Torvalds 0 siblings, 0 replies; 7+ messages in thread From: Linus Torvalds @ 2014-08-05 16:53 UTC (permalink / raw) To: Steven Noonan; +Cc: NeilBrown, Al Viro, linux-fsdevel, lkml On Mon, Aug 4, 2014 at 9:56 PM, Steven Noonan <steven@uplinklabs.net> wrote: > > Er, grep failed me. The above delta to gfs2 is obviously bogus. Yeah, and looking at the patch, I think the "make lockref count signed" should be a separate patch. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Is lockref_get_not_zero() really correct in dget_parent() 2014-08-05 4:07 ` Linus Torvalds 2014-08-05 4:54 ` Steven Noonan @ 2014-08-05 5:14 ` Al Viro 2014-08-05 22:58 ` Eric W. Biederman 1 sibling, 1 reply; 7+ messages in thread From: Al Viro @ 2014-08-05 5:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: NeilBrown, linux-fsdevel, lkml On Mon, Aug 04, 2014 at 09:07:32PM -0700, Linus Torvalds wrote: > So renaming it to "lockref_get_active()", and changing the "not zero" > test to check for "positive" and change the rtype of "count" to be > signed, all sound like good things to me. Fine by me. I can do that, unless somebody else beats me to that. > But I don't actually think it's an active bug, it's just an "active > horribly ugly and subtly working code". I guess in theory if you can > get lots of CPU's triggering the race at the same time, the magic > negative number could become zero and positive, but at that point I > don't think we're really talking reality any more. > > Can somebody pick holes in that? Does somebody want to send in the > cleanup patch? FWIW, dget_parent() has been a bloody annoyance - most of the callers used to be racy and looking through the current set... Take a look at e.g. fs/btrfs/tree-log.c:check_parent_dirs_for_sync(). In the loop there we do if (!parent || !parent->d_inode || sb != parent->d_inode->i_sb) break; if (IS_ROOT(parent)) break; parent = dget_parent(parent); dput(old_parent); old_parent = parent; inode = parent->d_inode; and it's so bogus it's not even funny. What the hell is that code trying to check? And while we are at it, can we race with renames there? >From my reading of that code, most of it ought to have been under rcu_read_lock() and sod all that playing with refcounts. Other btrfs users are not much nicer (who says, for instance, that result of check_parent_dirs_for_sync() will remain valid?) ecryptfs lock_parent(): AFAICS, all callers could do ecryptfs_dentry_to_lower() on dentry->d_parent instead of doing ecryptfs_dentry_to_lower(dentry) and then doing dget_parent() - the covering layer dentries have ->d_parent stabilized by ->i_mutex on said parents and we really, really don't want to run into the case when tree topologies go out of sync. I.e. we want to grab ->i_mutex on lower(parent), then check that it's equal to parent(lower) and, if it's at all possible that some joker has played with rename(2) on underlying layer mounted somewhere else, drop everything we'd taken and bugger off. XFS xfs_filestream_get_parent() is clearly bogus - parent = dget_parent(dentry); if (!parent) goto out_dput; When does dget_parent() return NULL? And that's just from a quick grep. The point is, dget_parent() is not a nice API - historically it's been abused more often than not and we keep playing whack-a-mole with it. Sigh... Speaking of bogosities, apparently nobody has noticed that automagical turning BSD process accounting off upon r/o remounts of the filesystem hosting the log got broken several years ago. Suppose we find an opened log when remounting. OK, it gets closed, which means filp_close(...). Which means that actual closing that sucker will happen just before we return to userland. Return with -EBUSY, since the filesystem still has a file opened for write when we get around to checking that... I have kinda-sorta fixes for that (basically, restoring the situation we used to have before delayed fput() and being damn careful about avoiding deadlocks), but I would really love to just rip that idiocy out. IOW, just let it fail with -EBUSY in such cases, same as for any other write-opened file on that fs. That, BTW, is the most painful part of the acct series, so being able to drop it would be very nice. Comments? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Is lockref_get_not_zero() really correct in dget_parent() 2014-08-05 5:14 ` Al Viro @ 2014-08-05 22:58 ` Eric W. Biederman 0 siblings, 0 replies; 7+ messages in thread From: Eric W. Biederman @ 2014-08-05 22:58 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, NeilBrown, linux-fsdevel, lkml Al Viro <viro@ZenIV.linux.org.uk> writes: > Speaking of bogosities, apparently nobody has noticed that automagical > turning BSD process accounting off upon r/o remounts of the filesystem > hosting the log got broken several years ago. Suppose we find an opened > log when remounting. OK, it gets closed, which means filp_close(...). > Which means that actual closing that sucker will happen just before we > return to userland. Return with -EBUSY, since the filesystem still has > a file opened for write when we get around to checking that... > > I have kinda-sorta fixes for that (basically, restoring the situation we > used to have before delayed fput() and being damn careful about avoiding > deadlocks), but I would really love to just rip that idiocy out. IOW, > just let it fail with -EBUSY in such cases, same as for any other write-opened > file on that fs. > > That, BTW, is the most painful part of the acct series, so being able to > drop it would be very nice. Comments? Long ago in this discucssion someone mentioned that fedora and related distro's don't have (or at least didn't have) code to close the accounting files before unmount during shutdown. I just checked the debain acct package and it very clearly turns off accounting before unmount (so debian should not care). So the practical test is will returning -EBUSY cause fedora based distro's it fail to unmount filesystems cleanly (looking at the code and postualting bsd process accounting being left on) I think such a change will cause such systems to fail to unmount their filesystems on shutdown/reboot and cause problems because of that. Eric ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-08-05 22:58 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-05 3:17 Is lockref_get_not_zero() really correct in dget_parent() NeilBrown 2014-08-05 4:07 ` Linus Torvalds 2014-08-05 4:54 ` Steven Noonan 2014-08-05 4:56 ` Steven Noonan 2014-08-05 16:53 ` Linus Torvalds 2014-08-05 5:14 ` Al Viro 2014-08-05 22:58 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).