* [PATCH][RFC] ecryptfs unlink/rmdir breakage (similar to caught in ecryptfs rename last year) [not found] ` <20191102180842.GN26530@ZenIV.linux.org.uk> @ 2019-11-03 17:05 ` Al Viro [not found] ` <20191103163524.GO26530@ZenIV.linux.org.uk> 1 sibling, 0 replies; 8+ messages in thread From: Al Viro @ 2019-11-03 17:05 UTC (permalink / raw) To: Tyler Hicks; +Cc: ecryptfs, linux-fsdevel, linux-kernel On Sat, Nov 02, 2019 at 06:08:42PM +0000, Al Viro wrote: > There's also some secondary stuff dropping out of that (e.g. ceph seeding > dcache on readdir and blindly unhashing dentries it sees stale instead of > doing d_invalidate() as it ought to - leads to fun results if you had > something mounted on a subdirectory that got removed/recreated on server), > but that's a separate pile of joy - doesn't affect this analysis, so > it'll have to be dealt with later. It had been an interesting couple of > weeks... More secondaries: a piece of nastiness in ecryptfs unlink/rmdir - have the underlying layer mounted elsewhere, look something up via ecryptfs, move the underlying object via that mount elsewhere and call unlink(). Ends up passing the wrong directory inode to vfs_unlink() and then - to ->unlink() of the underlying fs... Embarrassing, since ecryptfs_rename() used to have exact same problem, caught and fixed in "ecryptfs_rename(): verify that lower dentries are still OK after lock_rename()" a year ago. Bugs are like mushrooms - found one, look around for its relatives... How about the following (completely untested) patch? Tyler, do you see any problems in that? ecryptfs: fix unlink and rmdir in face of underlying fs modifications A problem similar to the one caught in commit 74dd7c97ea2a ("ecryptfs_rename(): verify that lower dentries are still OK after lock_rename()") exists for unlink/rmdir as well. Instead of playing with dget_parent() of underlying dentry of victim and hoping it's the same as underlying dentry of our directory, do the following: * find the underlying dentry of victim * find the underlying directory of victim's parent (stable since the victim is ecryptfs dentry and inode of its parent is held exclusive by the caller). * lock the inode of dentry underlying the victim's parent * check that underlying dentry of victim is still hashed and has the right parent - it can be moved, but it can't be moved to/from the directory we are holding exclusive. So while ->d_parent itself might not be stable, the result of comparison is. If the check passes, everything is fine - underlying directory is locked, underlying victim is still a child of that directory and we can go ahead and feed them to vfs_unlink(). As in the current mainline we need to pin the underlying dentry of victim, so that it wouldn't go negative under us, but that's the only temporary reference that needs to be grabbed there. Underlying dentry of parent won't go away (it's pinned by the parent, which is held by caller), so there's no need to grab it. The same problem (with the same solution) exists for rmdir. Moreover, rename gets simpler and more robust with the same "don't bother with dget_parent()" approach. Fixes: 74dd7c97ea2 "ecryptfs_rename(): verify that lower dentries are still OK after lock_rename()" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 18426f4855f1..a905d5f4f3b0 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -128,13 +128,20 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry, struct inode *inode) { struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry); - struct inode *lower_dir_inode = ecryptfs_inode_to_lower(dir); struct dentry *lower_dir_dentry; + struct inode *lower_dir_inode; int rc; - dget(lower_dentry); - lower_dir_dentry = lock_parent(lower_dentry); - rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL); + lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent); + lower_dir_inode = d_inode(lower_dir_dentry); + inode_lock_nested(lower_dir_inode, I_MUTEX_PARENT); + dget(lower_dentry); // don't even try to make the lower negative + if (lower_dentry->d_parent != lower_dir_dentry) + rc = -EINVAL; + else if (d_unhashed(lower_dentry)) + rc = -EINVAL; + else + rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL); if (rc) { printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc); goto out_unlock; @@ -142,10 +149,11 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry, fsstack_copy_attr_times(dir, lower_dir_inode); set_nlink(inode, ecryptfs_inode_to_lower(inode)->i_nlink); inode->i_ctime = dir->i_ctime; - d_drop(dentry); out_unlock: - unlock_dir(lower_dir_dentry); dput(lower_dentry); + inode_unlock(lower_dir_inode); + if (!rc) + d_drop(dentry); return rc; } @@ -512,22 +520,30 @@ static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry) { struct dentry *lower_dentry; struct dentry *lower_dir_dentry; + struct inode *lower_dir_inode; int rc; lower_dentry = ecryptfs_dentry_to_lower(dentry); - dget(dentry); - lower_dir_dentry = lock_parent(lower_dentry); - dget(lower_dentry); - rc = vfs_rmdir(d_inode(lower_dir_dentry), lower_dentry); - dput(lower_dentry); - if (!rc && d_really_is_positive(dentry)) + lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent); + lower_dir_inode = d_inode(lower_dir_dentry); + + inode_lock_nested(lower_dir_inode, I_MUTEX_PARENT); + dget(lower_dentry); // don't even try to make the lower negative + if (lower_dentry->d_parent != lower_dir_dentry) + rc = -EINVAL; + else if (d_unhashed(lower_dentry)) + rc = -EINVAL; + else + rc = vfs_rmdir(lower_dir_inode, lower_dentry); + if (!rc) { clear_nlink(d_inode(dentry)); - fsstack_copy_attr_times(dir, d_inode(lower_dir_dentry)); - set_nlink(dir, d_inode(lower_dir_dentry)->i_nlink); - unlock_dir(lower_dir_dentry); + fsstack_copy_attr_times(dir, lower_dir_inode); + set_nlink(dir, lower_dir_inode->i_nlink); + } + dput(lower_dentry); + inode_unlock(lower_dir_inode); if (!rc) d_drop(dentry); - dput(dentry); return rc; } @@ -565,20 +581,22 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct dentry *lower_new_dentry; struct dentry *lower_old_dir_dentry; struct dentry *lower_new_dir_dentry; - struct dentry *trap = NULL; + struct dentry *trap; struct inode *target_inode; if (flags) return -EINVAL; + lower_old_dir_dentry = ecryptfs_dentry_to_lower(old_dentry->d_parent); + lower_new_dir_dentry = ecryptfs_dentry_to_lower(new_dentry->d_parent); + lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry); lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry); - dget(lower_old_dentry); - dget(lower_new_dentry); - lower_old_dir_dentry = dget_parent(lower_old_dentry); - lower_new_dir_dentry = dget_parent(lower_new_dentry); + target_inode = d_inode(new_dentry); + trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry); + dget(lower_new_dentry); rc = -EINVAL; if (lower_old_dentry->d_parent != lower_old_dir_dentry) goto out_lock; @@ -606,11 +624,8 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, if (new_dir != old_dir) fsstack_copy_attr_all(old_dir, d_inode(lower_old_dir_dentry)); out_lock: - unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry); - dput(lower_new_dir_dentry); - dput(lower_old_dir_dentry); dput(lower_new_dentry); - dput(lower_old_dentry); + unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry); return rc; } ^ permalink raw reply related [flat|nested] 8+ messages in thread
[parent not found: <20191103163524.GO26530@ZenIV.linux.org.uk>]
* Re: [RFC] lookup_one_len_unlocked() lousy calling conventions [not found] ` <20191103163524.GO26530@ZenIV.linux.org.uk> @ 2019-11-03 18:20 ` Al Viro 2019-11-03 18:51 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Al Viro 0 siblings, 1 reply; 8+ messages in thread From: Al Viro @ 2019-11-03 18:20 UTC (permalink / raw) To: linux-fsdevel Cc: Ritesh Harjani, linux-kernel, wugyuan, jlayton, hsiangkao, Jan Kara, Linus Torvalds, ecryptfs On Sun, Nov 03, 2019 at 04:35:24PM +0000, Al Viro wrote: > lookup_one_len_unlocked() calling conventions are wrong for its callers. > Namely, 11 out of 12 callers really want ERR_PTR(-ENOENT) on negatives. > Most of them take care to check, some rely upon that being impossible in > their case. Interactions with dentry turning positive right after > lookup_one_len_unlocked() has returned it are of varying bugginess... > > The only exception is ecryptfs, where we do lookup in the underlying fs > on ecryptfs_lookup() and want to retain a negative dentry if we get one. Looking at that code... the thing that deals with the result of lookup in underlying fs is ecryptfs_lookup_interpose(), and there we have this: struct inode *inode, *lower_inode = d_inode(lower_dentry); ... dentry_info = kmem_cache_alloc(ecryptfs_dentry_info_cache, GFP_KERNEL); ... if (d_really_is_negative(lower_dentry)) { /* We want to add because we couldn't find in lower */ d_add(dentry, NULL); return NULL; } inode = __ecryptfs_get_inode(lower_inode, dentry->d_sb); If lower dentry used to be negative, but went positive while we'd been doing allocation, we'll get d_really_is_negative() (i.e. !lower_dentry->d_inode) false, but lower_inode (fetched earlier) still NULL. __ecryptfs_get_inode() starts with if (lower_inode->i_sb != ecryptfs_superblock_to_lower(sb)) return ERR_PTR(-EXDEV); which won't be happy in that situation... That has nothing to do with barriers, ->d_flags, etc. - the window is rather wide here. GFP_KERNEL allocation can block just fine. IOW, the only caller of lookup_one_len_unlocked() that does not reject negative dentries doesn't manage to handle them correctly ;-/ And then in the same ecryptfs_lookup_interpose() we have e.g. fsstack_copy_attr_atime(d_inode(dentry->d_parent), d_inode(lower_dentry->d_parent)); Now, dentry->d_parent is stable; dentry is guaranteed to be new and not yet visible to anybody else, besides it's negative and the parent is held shared, so it couldn't have been moved around even if it had been seen by somebody else. However, lower_dentry->d_parent is a different story. We are not holding the lock on its parent anymore; it could've been moved around by somebody mounting the underlying layer elsewhere and accessing it directly. Moreover, there's nothing to guarantee that the pointer we fetch from lower_dentry->d_parent won't be pointing to freed memory by the time we get around to looking at its inode - lose the timeslice to preemption just after fetching ->d_parent, have another process move the damn thing around and there's nothing to keep the ex-parent around by the time you regain CPU. The problem goes all way back to addd65ad8d19 "eCryptfs: Filename Encryption: filldir, lookup, and readlink" from 2009. That turned lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent); into lower_dir_dentry = lower_dentry->d_parent; and it had hit the fan... Sure, "somebody mounted the underlying fs elsewhere and is actively trying to screw us over" is not how ecryptfs is supposed to be used and it can demonstrate unexpected behavior - odd errors, etc. But that behaviour should not include oopsen and access to freed kernel memory... ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable 2019-11-03 18:20 ` [RFC] lookup_one_len_unlocked() lousy calling conventions Al Viro @ 2019-11-03 18:51 ` Al Viro 2019-11-03 19:03 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either Al Viro 2019-11-13 7:01 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Amir Goldstein 0 siblings, 2 replies; 8+ messages in thread From: Al Viro @ 2019-11-03 18:51 UTC (permalink / raw) To: linux-fsdevel Cc: Ritesh Harjani, linux-kernel, wugyuan, jlayton, hsiangkao, Jan Kara, Linus Torvalds, ecryptfs lower_dentry can't go from positive to negative (we have it pinned), but it *can* go from negative to positive. So fetching ->d_inode into a local variable, doing a blocking allocation, checking that now ->d_inode is non-NULL and feeding the value we'd fetched earlier to a function that won't accept NULL is not a good idea. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index a905d5f4f3b0..3c2298721359 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -319,7 +319,7 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode) static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, struct dentry *lower_dentry) { - struct inode *inode, *lower_inode = d_inode(lower_dentry); + struct inode *inode, *lower_inode; struct ecryptfs_dentry_info *dentry_info; struct vfsmount *lower_mnt; int rc = 0; @@ -339,7 +339,15 @@ static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, dentry_info->lower_path.mnt = lower_mnt; dentry_info->lower_path.dentry = lower_dentry; - if (d_really_is_negative(lower_dentry)) { + /* + * negative dentry can go positive under us here - its parent is not + * locked. That's OK and that could happen just as we return from + * ecryptfs_lookup() anyway. Just need to be careful and fetch + * ->d_inode only once - it's not stable here. + */ + lower_inode = READ_ONCE(lower_dentry->d_inode); + + if (!lower_inode) { /* We want to add because we couldn't find in lower */ d_add(dentry, NULL); return NULL; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either 2019-11-03 18:51 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Al Viro @ 2019-11-03 19:03 ` Al Viro 2019-11-13 7:01 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Amir Goldstein 1 sibling, 0 replies; 8+ messages in thread From: Al Viro @ 2019-11-03 19:03 UTC (permalink / raw) To: linux-fsdevel Cc: Ritesh Harjani, linux-kernel, wugyuan, jlayton, hsiangkao, Jan Kara, Linus Torvalds, ecryptfs We need to get the underlying dentry of parent; sure, absent the races it is the parent of underlying dentry, but there's nothing to prevent losing a timeslice to preemtion in the middle of evaluation of lower_dentry->d_parent->d_inode, having another process move lower_dentry around and have its (ex)parent not pinned anymore and freed on memory pressure. Then we regain CPU and try to fetch ->d_inode from memory that is freed by that point. dentry->d_parent *is* stable here - it's an argument of ->lookup() and we are guaranteed that it won't be moved anywhere until we feed it to d_add/d_splice_alias. So we safely go that way to get to its underlying dentry. Cc: stable@vger.kernel.org # since 2009 or so Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 3c2298721359..e23752d9a79f 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -319,9 +319,9 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode) static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, struct dentry *lower_dentry) { + struct path *path = ecryptfs_dentry_to_lower_path(dentry->d_parent); struct inode *inode, *lower_inode; struct ecryptfs_dentry_info *dentry_info; - struct vfsmount *lower_mnt; int rc = 0; dentry_info = kmem_cache_alloc(ecryptfs_dentry_info_cache, GFP_KERNEL); @@ -330,13 +330,12 @@ static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, return ERR_PTR(-ENOMEM); } - lower_mnt = mntget(ecryptfs_dentry_to_lower_mnt(dentry->d_parent)); fsstack_copy_attr_atime(d_inode(dentry->d_parent), - d_inode(lower_dentry->d_parent)); + d_inode(path->dentry)); BUG_ON(!d_count(lower_dentry)); ecryptfs_set_dentry_private(dentry, dentry_info); - dentry_info->lower_path.mnt = lower_mnt; + dentry_info->lower_path.mnt = mntget(path->mnt); dentry_info->lower_path.dentry = lower_dentry; /* ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable 2019-11-03 18:51 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Al Viro 2019-11-03 19:03 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either Al Viro @ 2019-11-13 7:01 ` Amir Goldstein 2019-11-13 12:52 ` Al Viro 1 sibling, 1 reply; 8+ messages in thread From: Amir Goldstein @ 2019-11-13 7:01 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, Ritesh Harjani, linux-kernel, wugyuan, Jeff Layton, Gao Xiang, Jan Kara, Linus Torvalds, ecryptfs On Sun, Nov 3, 2019 at 8:52 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > lower_dentry can't go from positive to negative (we have it pinned), > but it *can* go from negative to positive. So fetching ->d_inode > into a local variable, doing a blocking allocation, checking that > now ->d_inode is non-NULL and feeding the value we'd fetched > earlier to a function that won't accept NULL is not a good idea. > > Cc: stable@vger.kernel.org > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c > index a905d5f4f3b0..3c2298721359 100644 > --- a/fs/ecryptfs/inode.c > +++ b/fs/ecryptfs/inode.c > @@ -319,7 +319,7 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode) > static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, > struct dentry *lower_dentry) > { > - struct inode *inode, *lower_inode = d_inode(lower_dentry); > + struct inode *inode, *lower_inode; > struct ecryptfs_dentry_info *dentry_info; > struct vfsmount *lower_mnt; > int rc = 0; > @@ -339,7 +339,15 @@ static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, > dentry_info->lower_path.mnt = lower_mnt; > dentry_info->lower_path.dentry = lower_dentry; > > - if (d_really_is_negative(lower_dentry)) { > + /* > + * negative dentry can go positive under us here - its parent is not > + * locked. That's OK and that could happen just as we return from > + * ecryptfs_lookup() anyway. Just need to be careful and fetch > + * ->d_inode only once - it's not stable here. > + */ > + lower_inode = READ_ONCE(lower_dentry->d_inode); > + > + if (!lower_inode) { > /* We want to add because we couldn't find in lower */ > d_add(dentry, NULL); > return NULL; Sigh! Open coding a human readable macro to solve a subtle lookup race. That doesn't sound like a scalable solution. I have a feeling this is not the last patch we will be seeing along those lines. Seeing that developers already confused about when they should use d_really_is_negative() over d_is_negative() [1] and we probably don't want to add d_really_really_is_negative(), how about applying that READ_ONCE into d_really_is_negative() and re-purpose it as a macro to be used when races with lookup are a concern? Thanks, Amir. [1] https://lore.kernel.org/linux-fsdevel/20190903135803.GA25692@hsiangkao-HP-ZHAN-66-Pro-G1/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable 2019-11-13 7:01 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Amir Goldstein @ 2019-11-13 12:52 ` Al Viro 2019-11-13 16:22 ` Amir Goldstein 2019-11-13 20:18 ` Jean-Louis Biasini 0 siblings, 2 replies; 8+ messages in thread From: Al Viro @ 2019-11-13 12:52 UTC (permalink / raw) To: Amir Goldstein Cc: linux-fsdevel, Ritesh Harjani, linux-kernel, wugyuan, Jeff Layton, Gao Xiang, Jan Kara, Linus Torvalds, ecryptfs On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote: > > - if (d_really_is_negative(lower_dentry)) { > > + /* > > + * negative dentry can go positive under us here - its parent is not > > + * locked. That's OK and that could happen just as we return from > > + * ecryptfs_lookup() anyway. Just need to be careful and fetch > > + * ->d_inode only once - it's not stable here. > > + */ > > + lower_inode = READ_ONCE(lower_dentry->d_inode); > > + > > + if (!lower_inode) { > > /* We want to add because we couldn't find in lower */ > > d_add(dentry, NULL); > > return NULL; > > Sigh! > > Open coding a human readable macro to solve a subtle lookup race. > That doesn't sound like a scalable solution. > I have a feeling this is not the last patch we will be seeing along > those lines. > > Seeing that developers already confused about when they should use > d_really_is_negative() over d_is_negative() [1] and we probably > don't want to add d_really_really_is_negative(), how about > applying that READ_ONCE into d_really_is_negative() and > re-purpose it as a macro to be used when races with lookup are > a concern? Would you care to explain what that "fix" would've achieved here, considering the fact that barriers are no-ops on UP and this is *NOT* an SMP race? And it's very much present on UP - we have fetch ->d_inode into local variable do blocking allocation check if ->d_inode is NULL now if it is not, use the value in local variable and expect it to be non-NULL That's not a case of missing barriers. At all. And no redefinition of d_really_is_negative() is going to help - it can't retroactively affect the value explicitly fetched into a local variable some time prior to that. There are other patches dealing with ->d_inode accesses, but they are generally not along the same lines. The problem is rarely the same... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable 2019-11-13 12:52 ` Al Viro @ 2019-11-13 16:22 ` Amir Goldstein 2019-11-13 20:18 ` Jean-Louis Biasini 1 sibling, 0 replies; 8+ messages in thread From: Amir Goldstein @ 2019-11-13 16:22 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, Ritesh Harjani, linux-kernel, wugyuan, Jeff Layton, Gao Xiang, Jan Kara, Linus Torvalds, ecryptfs On Wed, Nov 13, 2019 at 2:52 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote: > > > - if (d_really_is_negative(lower_dentry)) { > > > + /* > > > + * negative dentry can go positive under us here - its parent is not > > > + * locked. That's OK and that could happen just as we return from > > > + * ecryptfs_lookup() anyway. Just need to be careful and fetch > > > + * ->d_inode only once - it's not stable here. > > > + */ > > > + lower_inode = READ_ONCE(lower_dentry->d_inode); > > > + > > > + if (!lower_inode) { > > > /* We want to add because we couldn't find in lower */ > > > d_add(dentry, NULL); > > > return NULL; > > > > Sigh! > > > > Open coding a human readable macro to solve a subtle lookup race. > > That doesn't sound like a scalable solution. > > I have a feeling this is not the last patch we will be seeing along > > those lines. > > > > Seeing that developers already confused about when they should use > > d_really_is_negative() over d_is_negative() [1] and we probably > > don't want to add d_really_really_is_negative(), how about > > applying that READ_ONCE into d_really_is_negative() and > > re-purpose it as a macro to be used when races with lookup are > > a concern? > > Would you care to explain what that "fix" would've achieved here, > considering the fact that barriers are no-ops on UP and this is > *NOT* an SMP race? > > And it's very much present on UP - we have > fetch ->d_inode into local variable > do blocking allocation > check if ->d_inode is NULL now > if it is not, use the value in local variable and expect it to be non-NULL > > That's not a case of missing barriers. At all. And no redefinition of > d_really_is_negative() is going to help - it can't retroactively affect > the value explicitly fetched into a local variable some time prior to > that. > Indeed. I missed that part of your commit message and didn't realize the variable was being used later. The language in the comment "can go positive under us" implied SMP race so I misunderstood the reason for READ_ONCE(). Sorry for the noise. Amir. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable 2019-11-13 12:52 ` Al Viro 2019-11-13 16:22 ` Amir Goldstein @ 2019-11-13 20:18 ` Jean-Louis Biasini 1 sibling, 0 replies; 8+ messages in thread From: Jean-Louis Biasini @ 2019-11-13 20:18 UTC (permalink / raw) To: Al Viro, Amir Goldstein Cc: linux-fsdevel, Ritesh Harjani, linux-kernel, wugyuan, Jeff Layton, Gao Xiang, Jan Kara, Linus Torvalds, ecryptfs Please can you UNSUBSCRIBE me from this list? thx Le 13/11/2019 à 13:52, Al Viro a écrit : > On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote: >>> - if (d_really_is_negative(lower_dentry)) { >>> + /* >>> + * negative dentry can go positive under us here - its parent is not >>> + * locked. That's OK and that could happen just as we return from >>> + * ecryptfs_lookup() anyway. Just need to be careful and fetch >>> + * ->d_inode only once - it's not stable here. >>> + */ >>> + lower_inode = READ_ONCE(lower_dentry->d_inode); >>> + >>> + if (!lower_inode) { >>> /* We want to add because we couldn't find in lower */ >>> d_add(dentry, NULL); >>> return NULL; >> Sigh! >> >> Open coding a human readable macro to solve a subtle lookup race. >> That doesn't sound like a scalable solution. >> I have a feeling this is not the last patch we will be seeing along >> those lines. >> >> Seeing that developers already confused about when they should use >> d_really_is_negative() over d_is_negative() [1] and we probably >> don't want to add d_really_really_is_negative(), how about >> applying that READ_ONCE into d_really_is_negative() and >> re-purpose it as a macro to be used when races with lookup are >> a concern? > Would you care to explain what that "fix" would've achieved here, > considering the fact that barriers are no-ops on UP and this is > *NOT* an SMP race? > > And it's very much present on UP - we have > fetch ->d_inode into local variable > do blocking allocation > check if ->d_inode is NULL now > if it is not, use the value in local variable and expect it to be non-NULL > > That's not a case of missing barriers. At all. And no redefinition of > d_really_is_negative() is going to help - it can't retroactively affect > the value explicitly fetched into a local variable some time prior to > that. > > There are other patches dealing with ->d_inode accesses, but they are > generally not along the same lines. The problem is rarely the same... > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-11-13 20:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20190927044243.18856-1-riteshh@linux.ibm.com>
[not found] ` <20191015040730.6A84742047@d06av24.portsmouth.uk.ibm.com>
[not found] ` <20191022133855.B1B4752050@d06av21.portsmouth.uk.ibm.com>
[not found] ` <20191022143736.GX26530@ZenIV.linux.org.uk>
[not found] ` <20191022201131.GZ26530@ZenIV.linux.org.uk>
[not found] ` <20191023110551.D04AE4C044@d06av22.portsmouth.uk.ibm.com>
[not found] ` <20191101234622.GM26530@ZenIV.linux.org.uk>
[not found] ` <20191102172229.GT20975@paulmck-ThinkPad-P72>
[not found] ` <20191102180842.GN26530@ZenIV.linux.org.uk>
2019-11-03 17:05 ` [PATCH][RFC] ecryptfs unlink/rmdir breakage (similar to caught in ecryptfs rename last year) Al Viro
[not found] ` <20191103163524.GO26530@ZenIV.linux.org.uk>
2019-11-03 18:20 ` [RFC] lookup_one_len_unlocked() lousy calling conventions Al Viro
2019-11-03 18:51 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Al Viro
2019-11-03 19:03 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either Al Viro
2019-11-13 7:01 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Amir Goldstein
2019-11-13 12:52 ` Al Viro
2019-11-13 16:22 ` Amir Goldstein
2019-11-13 20:18 ` Jean-Louis Biasini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).