* [PATCH v4 1/5] ceph: clean up unsafe d_parent access in __choose_mds
2016-12-15 11:34 [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses Jeff Layton
@ 2016-12-15 11:34 ` Jeff Layton
2016-12-15 11:34 ` [PATCH v4 2/5] ceph: clean up unsafe d_parent accesses in build_dentry_path Jeff Layton
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2016-12-15 11:34 UTC (permalink / raw)
To: ceph-devel; +Cc: zyan, sage, idryomov, linux-fsdevel, dhowells, viro
__choose_mds exists to pick an MDS to use when issuing a call. Doing
that typically involves picking an inode and using the authoritative
MDS for it. In most cases, that's pretty straightforward, as we are
using an inode to which we hold a reference (usually represented by
r_dentry or r_inode in the request).
In the case of a snapshotted directory however, we need to fetch
the non-snapped parent, which involves walking back up the parents
in the tree. The dentries in the snapshot dir are effectively frozen
but the overall parent is _not_, and could vanish if a concurrent
rename were to occur.
Clean this code up and take special care to ensure the validity of
the entries we're working with. First, try to use the inode in
r_locked_dir if one exists. If not and all we have is r_dentry,
then we have to walk back up the tree. Use the rcu_read_lock for
this so we can ensure that any d_parent we find won't go away, and
take extra care to deal with the possibility that the dentries could
go negative.
Change get_nonsnap_parent to return an inode, and take a reference to
that inode before returning (if any). Change all of the other places
where we set "inode" in __choose_mds to also take a reference, and then
call iput on that inode before exiting the function.
Link: http://tracker.ceph.com/issues/18148
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
fs/ceph/mds_client.c | 64 ++++++++++++++++++++++++++++++++++------------------
1 file changed, 42 insertions(+), 22 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 815acd1a56d4..20b241aafcee 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -667,6 +667,27 @@ static void __unregister_request(struct ceph_mds_client *mdsc,
}
/*
+ * Walk back up the dentry tree until we hit a dentry representing a
+ * non-snapshot inode. We do this using the rcu_read_lock (which must be held
+ * when calling this) to ensure that the objects won't disappear while we're
+ * working with them. Once we hit a candidate dentry, we attempt to take a
+ * reference to it, and return that as the result.
+ */
+static struct inode *get_nonsnap_parent(struct dentry *dentry) { struct inode
+ *inode = NULL;
+
+ while (dentry && !IS_ROOT(dentry)) {
+ inode = d_inode_rcu(dentry);
+ if (!inode || ceph_snap(inode) == CEPH_NOSNAP)
+ break;
+ dentry = dentry->d_parent;
+ }
+ if (inode)
+ inode = igrab(inode);
+ return inode;
+}
+
+/*
* Choose mds to send request to next. If there is a hint set in the
* request (e.g., due to a prior forward hint from the mds), use that.
* Otherwise, consult frag tree and/or caps to identify the
@@ -674,19 +695,6 @@ static void __unregister_request(struct ceph_mds_client *mdsc,
*
* Called under mdsc->mutex.
*/
-static struct dentry *get_nonsnap_parent(struct dentry *dentry)
-{
- /*
- * we don't need to worry about protecting the d_parent access
- * here because we never renaming inside the snapped namespace
- * except to resplice to another snapdir, and either the old or new
- * result is a valid result.
- */
- while (!IS_ROOT(dentry) && ceph_snap(d_inode(dentry)) != CEPH_NOSNAP)
- dentry = dentry->d_parent;
- return dentry;
-}
-
static int __choose_mds(struct ceph_mds_client *mdsc,
struct ceph_mds_request *req)
{
@@ -716,30 +724,39 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
inode = NULL;
if (req->r_inode) {
inode = req->r_inode;
+ ihold(inode);
} else if (req->r_dentry) {
/* ignore race with rename; old or new d_parent is okay */
- struct dentry *parent = req->r_dentry->d_parent;
- struct inode *dir = d_inode(parent);
+ struct dentry *parent;
+ struct inode *dir;
+
+ rcu_read_lock();
+ parent = req->r_dentry->d_parent;
+ dir = req->r_locked_dir ? : d_inode_rcu(parent);
- if (dir->i_sb != mdsc->fsc->sb) {
- /* not this fs! */
+ if (!dir || dir->i_sb != mdsc->fsc->sb) {
+ /* not this fs or parent went negative */
inode = d_inode(req->r_dentry);
+ if (inode)
+ ihold(inode);
} else if (ceph_snap(dir) != CEPH_NOSNAP) {
/* direct snapped/virtual snapdir requests
* based on parent dir inode */
- struct dentry *dn = get_nonsnap_parent(parent);
- inode = d_inode(dn);
+ inode = get_nonsnap_parent(parent);
dout("__choose_mds using nonsnap parent %p\n", inode);
} else {
/* dentry target */
inode = d_inode(req->r_dentry);
if (!inode || mode == USE_AUTH_MDS) {
/* dir + name */
- inode = dir;
+ inode = igrab(dir);
hash = ceph_dentry_hash(dir, req->r_dentry);
is_hash = true;
+ } else {
+ ihold(inode);
}
}
+ rcu_read_unlock();
}
dout("__choose_mds %p is_hash=%d (%d) mode %d\n", inode, (int)is_hash,
@@ -768,7 +785,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
(int)r, frag.ndist);
if (ceph_mdsmap_get_state(mdsc->mdsmap, mds) >=
CEPH_MDS_STATE_ACTIVE)
- return mds;
+ goto out;
}
/* since this file/dir wasn't known to be
@@ -783,7 +800,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
inode, ceph_vinop(inode), frag.frag, mds);
if (ceph_mdsmap_get_state(mdsc->mdsmap, mds) >=
CEPH_MDS_STATE_ACTIVE)
- return mds;
+ goto out;
}
}
}
@@ -796,6 +813,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
cap = rb_entry(rb_first(&ci->i_caps), struct ceph_cap, ci_node);
if (!cap) {
spin_unlock(&ci->i_ceph_lock);
+ iput(inode);
goto random;
}
mds = cap->session->s_mds;
@@ -803,6 +821,8 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
inode, ceph_vinop(inode), mds,
cap == ci->i_auth_cap ? "auth " : "", cap);
spin_unlock(&ci->i_ceph_lock);
+out:
+ iput(inode);
return mds;
random:
--
2.7.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/5] ceph: clean up unsafe d_parent accesses in build_dentry_path
2016-12-15 11:34 [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses Jeff Layton
2016-12-15 11:34 ` [PATCH v4 1/5] ceph: clean up unsafe d_parent access in __choose_mds Jeff Layton
@ 2016-12-15 11:34 ` Jeff Layton
2016-12-15 11:34 ` [PATCH v4 3/5] ceph: pass parent dir ino info to build_dentry_path Jeff Layton
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2016-12-15 11:34 UTC (permalink / raw)
To: ceph-devel; +Cc: zyan, sage, idryomov, linux-fsdevel, dhowells, viro
While we hold a reference to the dentry when build_dentry_path is
called, we could end up racing with a rename that changes d_parent.
Handle that situation correctly, by using the rcu_read_lock to
ensure that the parent dentry and inode stick around long enough
to safely check ceph_snap and ceph_ino.
Link: http://tracker.ceph.com/issues/18148
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
fs/ceph/mds_client.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 20b241aafcee..03c82953ac68 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1799,13 +1799,18 @@ static int build_dentry_path(struct dentry *dentry,
int *pfreepath)
{
char *path;
+ struct inode *dir;
- if (ceph_snap(d_inode(dentry->d_parent)) == CEPH_NOSNAP) {
- *pino = ceph_ino(d_inode(dentry->d_parent));
+ rcu_read_lock();
+ dir = d_inode_rcu(dentry->d_parent);
+ if (dir && ceph_snap(dir) == CEPH_NOSNAP) {
+ *pino = ceph_ino(dir);
+ rcu_read_unlock();
*ppath = dentry->d_name.name;
*ppathlen = dentry->d_name.len;
return 0;
}
+ rcu_read_unlock();
path = ceph_mdsc_build_path(dentry, ppathlen, pino, 1);
if (IS_ERR(path))
return PTR_ERR(path);
--
2.7.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 3/5] ceph: pass parent dir ino info to build_dentry_path
2016-12-15 11:34 [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses Jeff Layton
2016-12-15 11:34 ` [PATCH v4 1/5] ceph: clean up unsafe d_parent access in __choose_mds Jeff Layton
2016-12-15 11:34 ` [PATCH v4 2/5] ceph: clean up unsafe d_parent accesses in build_dentry_path Jeff Layton
@ 2016-12-15 11:34 ` Jeff Layton
2016-12-15 11:34 ` [PATCH v4 4/5] ceph: fix unsafe dcache access in ceph_encode_dentry_release Jeff Layton
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2016-12-15 11:34 UTC (permalink / raw)
To: ceph-devel; +Cc: zyan, sage, idryomov, linux-fsdevel, dhowells, viro
In the event that we have a parent inode reference in the request, we
can use that instead of mucking about in the dcache. Pass any parent
inode info we have down to build_dentry_path so it can make use of it.
Link: http://tracker.ceph.com/issues/18148
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
fs/ceph/mds_client.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 03c82953ac68..9fe5a56122f0 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1794,15 +1794,15 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *base,
return path;
}
-static int build_dentry_path(struct dentry *dentry,
+static int build_dentry_path(struct dentry *dentry, struct inode *dir,
const char **ppath, int *ppathlen, u64 *pino,
int *pfreepath)
{
char *path;
- struct inode *dir;
rcu_read_lock();
- dir = d_inode_rcu(dentry->d_parent);
+ if (!dir)
+ dir = d_inode_rcu(dentry->d_parent);
if (dir && ceph_snap(dir) == CEPH_NOSNAP) {
*pino = ceph_ino(dir);
rcu_read_unlock();
@@ -1846,8 +1846,8 @@ static int build_inode_path(struct inode *inode,
* an explicit ino+path.
*/
static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
- const char *rpath, u64 rino,
- const char **ppath, int *pathlen,
+ struct inode *rdiri, const char *rpath,
+ u64 rino, const char **ppath, int *pathlen,
u64 *ino, int *freepath)
{
int r = 0;
@@ -1857,7 +1857,8 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
dout(" inode %p %llx.%llx\n", rinode, ceph_ino(rinode),
ceph_snap(rinode));
} else if (rdentry) {
- r = build_dentry_path(rdentry, ppath, pathlen, ino, freepath);
+ r = build_dentry_path(rdentry, rdiri, ppath, pathlen, ino,
+ freepath);
dout(" dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen,
*ppath);
} else if (rpath || rino) {
@@ -1890,7 +1891,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
int ret;
ret = set_request_path_attr(req->r_inode, req->r_dentry,
- req->r_path1, req->r_ino1.ino,
+ req->r_locked_dir, req->r_path1, req->r_ino1.ino,
&path1, &pathlen1, &ino1, &freepath1);
if (ret < 0) {
msg = ERR_PTR(ret);
@@ -1898,6 +1899,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
}
ret = set_request_path_attr(NULL, req->r_old_dentry,
+ req->r_old_dentry_dir,
req->r_path2, req->r_ino2.ino,
&path2, &pathlen2, &ino2, &freepath2);
if (ret < 0) {
--
2.7.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 4/5] ceph: fix unsafe dcache access in ceph_encode_dentry_release
2016-12-15 11:34 [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses Jeff Layton
` (2 preceding siblings ...)
2016-12-15 11:34 ` [PATCH v4 3/5] ceph: pass parent dir ino info to build_dentry_path Jeff Layton
@ 2016-12-15 11:34 ` Jeff Layton
2016-12-15 11:34 ` [PATCH v4 5/5] ceph: pass parent inode info to ceph_encode_dentry_release if we have it Jeff Layton
2016-12-15 13:35 ` [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses Yan, Zheng
5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2016-12-15 11:34 UTC (permalink / raw)
To: ceph-devel; +Cc: zyan, sage, idryomov, linux-fsdevel, dhowells, viro
Accessing d_parent requires some sort of locking or it could vanish
out from under us. Since we take the d_lock anyway, use that to fetch
d_parent and take a reference to it, and then use that reference to
call ceph_encode_inode_release.
Link: http://tracker.ceph.com/issues/18148
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
fs/ceph/caps.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 16e6ded0b7f2..6bc5c1efbb26 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -3834,7 +3834,7 @@ int ceph_encode_inode_release(void **p, struct inode *inode,
int ceph_encode_dentry_release(void **p, struct dentry *dentry,
int mds, int drop, int unless)
{
- struct inode *dir = d_inode(dentry->d_parent);
+ struct dentry *parent;
struct ceph_mds_request_release *rel = *p;
struct ceph_dentry_info *di = ceph_dentry(dentry);
int force = 0;
@@ -3849,9 +3849,12 @@ int ceph_encode_dentry_release(void **p, struct dentry *dentry,
spin_lock(&dentry->d_lock);
if (di->lease_session && di->lease_session->s_mds == mds)
force = 1;
+ parent = dget(dentry->d_parent);
spin_unlock(&dentry->d_lock);
- ret = ceph_encode_inode_release(p, dir, mds, drop, unless, force);
+ ret = ceph_encode_inode_release(p, d_inode(parent), mds, drop,
+ unless, force);
+ dput(parent);
spin_lock(&dentry->d_lock);
if (ret && di->lease_session && di->lease_session->s_mds == mds) {
--
2.7.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 5/5] ceph: pass parent inode info to ceph_encode_dentry_release if we have it
2016-12-15 11:34 [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses Jeff Layton
` (3 preceding siblings ...)
2016-12-15 11:34 ` [PATCH v4 4/5] ceph: fix unsafe dcache access in ceph_encode_dentry_release Jeff Layton
@ 2016-12-15 11:34 ` Jeff Layton
2016-12-15 13:35 ` [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses Yan, Zheng
5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2016-12-15 11:34 UTC (permalink / raw)
To: ceph-devel; +Cc: zyan, sage, idryomov, linux-fsdevel, dhowells, viro
If we have a parent inode reference already, then we don't need to
go back up the directory tree to find one.
Link: http://tracker.ceph.com/issues/18148
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
fs/ceph/caps.c | 11 +++++++----
fs/ceph/mds_client.c | 7 +++++--
fs/ceph/super.h | 1 +
3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 6bc5c1efbb26..e960b51dcae1 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -3832,9 +3832,10 @@ int ceph_encode_inode_release(void **p, struct inode *inode,
}
int ceph_encode_dentry_release(void **p, struct dentry *dentry,
+ struct inode *dir,
int mds, int drop, int unless)
{
- struct dentry *parent;
+ struct dentry *parent = NULL;
struct ceph_mds_request_release *rel = *p;
struct ceph_dentry_info *di = ceph_dentry(dentry);
int force = 0;
@@ -3849,11 +3850,13 @@ int ceph_encode_dentry_release(void **p, struct dentry *dentry,
spin_lock(&dentry->d_lock);
if (di->lease_session && di->lease_session->s_mds == mds)
force = 1;
- parent = dget(dentry->d_parent);
+ if (!dir) {
+ parent = dget(dentry->d_parent);
+ dir = d_inode(parent);
+ }
spin_unlock(&dentry->d_lock);
- ret = ceph_encode_inode_release(p, d_inode(parent), mds, drop,
- unless, force);
+ ret = ceph_encode_inode_release(p, dir, mds, drop, unless, force);
dput(parent);
spin_lock(&dentry->d_lock);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 9fe5a56122f0..a29997ce982b 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1953,10 +1953,13 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
mds, req->r_inode_drop, req->r_inode_unless, 0);
if (req->r_dentry_drop)
releases += ceph_encode_dentry_release(&p, req->r_dentry,
- mds, req->r_dentry_drop, req->r_dentry_unless);
+ req->r_locked_dir, mds, req->r_dentry_drop,
+ req->r_dentry_unless);
if (req->r_old_dentry_drop)
releases += ceph_encode_dentry_release(&p, req->r_old_dentry,
- mds, req->r_old_dentry_drop, req->r_old_dentry_unless);
+ req->r_old_dentry_dir, mds,
+ req->r_old_dentry_drop,
+ req->r_old_dentry_unless);
if (req->r_old_inode_drop)
releases += ceph_encode_inode_release(&p,
d_inode(req->r_old_dentry),
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 3e3fa9163059..e4ca0b718b6f 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -901,6 +901,7 @@ extern void ceph_flush_dirty_caps(struct ceph_mds_client *mdsc);
extern int ceph_encode_inode_release(void **p, struct inode *inode,
int mds, int drop, int unless, int force);
extern int ceph_encode_dentry_release(void **p, struct dentry *dn,
+ struct inode *dir,
int mds, int drop, int unless);
extern int ceph_get_caps(struct ceph_inode_info *ci, int need, int want,
--
2.7.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses
2016-12-15 11:34 [PATCH v4 0/5] ceph: clean up some unsafe dentry->d_parent accesses Jeff Layton
` (4 preceding siblings ...)
2016-12-15 11:34 ` [PATCH v4 5/5] ceph: pass parent inode info to ceph_encode_dentry_release if we have it Jeff Layton
@ 2016-12-15 13:35 ` Yan, Zheng
5 siblings, 0 replies; 7+ messages in thread
From: Yan, Zheng @ 2016-12-15 13:35 UTC (permalink / raw)
To: Jeff Layton
Cc: ceph-devel, Sage Weil, Ilya Dryomov, linux-fsdevel, David Howells,
viro
> On 15 Dec 2016, at 19:34, Jeff Layton <jlayton@redhat.com> wrote:
>
> v4: handle r_locked_dir correctly in __choose_mds
> rename rdirino to rdiri
>
> v3: use parent inode information more widely when we have it
>
> v2: use r_locked_dir in __choose_mds if it's available
> handle NULL d_inode_rcu(parent) properly
>
> In several places, the kcephfs client accesses dentry->d_parent without
> holding appropriate locking. This is not safe, as you can race with
> renames that can morph the tree such that is changes and dentries found
> previously can end up freed. This patchset fixes these places to access
> the parent(s) safely, mostly using the rcu_read_lock().
>
> Note that I'm not aware of any specific bug reports in this area --
> these were just discovered by inspection (mostly by Zheng). This passes
> xfstests at least up to test generic/095, where we hit an unrelated
> softlockup in the splice write code (http://tracker.ceph.com/issues/18130).
>
> Tracker bug link: http://tracker.ceph.com/issues/18148
>
>
> *** BLURB HERE ***
>
> clone of "ceph-4.10"
>
> Jeff Layton (5):
> ceph: clean up unsafe d_parent access in __choose_mds
> ceph: clean up unsafe d_parent accesses in build_dentry_path
> ceph: pass parent dir ino info to build_dentry_path
> ceph: fix unsafe dcache access in ceph_encode_dentry_release
> ceph: pass parent inode info to ceph_encode_dentry_release if we have
> it
>
> fs/ceph/caps.c | 8 ++++-
> fs/ceph/mds_client.c | 92 ++++++++++++++++++++++++++++++++++------------------
> fs/ceph/super.h | 1 +
> 3 files changed, 69 insertions(+), 32 deletions(-)
>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 7+ messages in thread