linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch
@ 2025-09-01 15:14 Alex Markuze
  2025-09-01 15:14 ` [PATCH 2/2] ceph/inode: drop extra reference from ceph_get_reply_dir() in ceph_fill_trace() Alex Markuze
  2025-09-02 18:42 ` [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch Viacheslav Dubeyko
  0 siblings, 2 replies; 6+ messages in thread
From: Alex Markuze @ 2025-09-01 15:14 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-kernel, Slava.Dubeyko, idryomov, Alex Markuze

When the parent directory lock is not held, req->r_parent can become stale between dentry lookup and request encoding.
The client updates r_parent to the correct inode based on the encoded path, but previously did not adjust CEPH_CAP_PIN references.

Release the pin from the old parent and acquire it for the new parent when switching r_parent, ensuring reference accounting stays balanced and avoiding leaks or underflows later in ceph_mdsc_release_request().

Signed-off-by: Alex Markuze <amarkuze@redhat.com>
---
 fs/ceph/mds_client.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ce0c129f4651..4e5926f36e8d 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3053,12 +3053,19 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 	 */
 	if (!parent_locked && req->r_parent && path_info1.vino.ino &&
 	    ceph_ino(req->r_parent) != path_info1.vino.ino) {
+		struct inode *old_parent = req->r_parent;
 		struct inode *correct_dir = ceph_get_inode(mdsc->fsc->sb, path_info1.vino, NULL);
 		if (!IS_ERR(correct_dir)) {
 			WARN_ONCE(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updating\n",
-				  ceph_ino(req->r_parent), path_info1.vino.ino);
-			iput(req->r_parent);
+			          ceph_ino(old_parent), path_info1.vino.ino);
+			/*
+			 * Transfer CEPH_CAP_PIN from the old parent to the new one.
+			 * The pin was taken earlier in ceph_mdsc_submit_request().
+			 */
+			ceph_put_cap_refs(ceph_inode(old_parent), CEPH_CAP_PIN);
+			iput(old_parent);
 			req->r_parent = correct_dir;
+			ceph_get_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN);
 		}
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ceph/inode: drop extra reference from ceph_get_reply_dir() in ceph_fill_trace()
  2025-09-01 15:14 [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch Alex Markuze
@ 2025-09-01 15:14 ` Alex Markuze
  2025-09-03 18:51   ` Viacheslav Dubeyko
  2025-09-02 18:42 ` [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch Viacheslav Dubeyko
  1 sibling, 1 reply; 6+ messages in thread
From: Alex Markuze @ 2025-09-01 15:14 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-kernel, Slava.Dubeyko, idryomov, Alex Markuze

ceph_get_reply_dir() may return a different, referenced inode when r_parent is stale and the parent directory lock is not held.
ceph_fill_trace() used that inode but failed to drop the reference when it differed from req->r_parent, leaking an inode reference.

Keep the directory inode in a local and iput() it at function end if it does not match req->r_parent.

Signed-off-by: Alex Markuze <amarkuze@redhat.com>
---
 fs/ceph/inode.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 470ee595ecf2..439c08ece283 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1585,6 +1585,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 	struct ceph_fs_client *fsc = ceph_sb_to_fs_client(sb);
 	struct ceph_client *cl = fsc->client;
 	int err = 0;
+	struct inode *dir = NULL;
 
 	doutc(cl, "%p is_dentry %d is_target %d\n", req,
 	      rinfo->head->is_dentry, rinfo->head->is_target);
@@ -1601,7 +1602,11 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 		 * r_parent may be stale, in cases when R_PARENT_LOCKED is not set,
 		 * so we need to get the correct inode
 		 */
-		struct inode *dir = ceph_get_reply_dir(sb, req->r_parent, rinfo);
+		dir = ceph_get_reply_dir(sb, req->r_parent, rinfo);
+		if (IS_ERR(dir)) {
+			err = PTR_ERR(dir);
+			goto done;
+		}
 		if (dir) {
 			err = ceph_fill_inode(dir, NULL, &rinfo->diri,
 					      rinfo->dirfrag, session, -1,
@@ -1869,6 +1874,9 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 					    &dvino, ptvino);
 	}
 done:
+	/* Drop extra ref from ceph_get_reply_dir() if it returned a new inode */
+	if (!IS_ERR(dir) && dir && dir != req->r_parent)
+		iput(dir);
 	doutc(cl, "done err=%d\n", err);
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re:  [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch
  2025-09-01 15:14 [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch Alex Markuze
  2025-09-01 15:14 ` [PATCH 2/2] ceph/inode: drop extra reference from ceph_get_reply_dir() in ceph_fill_trace() Alex Markuze
@ 2025-09-02 18:42 ` Viacheslav Dubeyko
  2025-09-03  8:14   ` Alex Markuze
  1 sibling, 1 reply; 6+ messages in thread
From: Viacheslav Dubeyko @ 2025-09-02 18:42 UTC (permalink / raw)
  To: Alex Markuze, ceph-devel@vger.kernel.org
  Cc: idryomov@gmail.com, linux-kernel@vger.kernel.org

On Mon, 2025-09-01 at 15:14 +0000, Alex Markuze wrote:
> When the parent directory lock is not held, req->r_parent can become stale between dentry lookup and request encoding.
> The client updates r_parent to the correct inode based on the encoded path, but previously did not adjust CEPH_CAP_PIN references.
> 
> Release the pin from the old parent and acquire it for the new parent when switching r_parent, ensuring reference accounting stays balanced and avoiding leaks or underflows later in ceph_mdsc_release_request().
> 

I cannot apply the patch on current state of the kernel. I assume that this
patch is improvement of previous patch set. If so, then it will be better to
send another version of previous patch set. Otherwise, it's hard to review and
impossible to test it.

Thanks,
Slava.

> Signed-off-by: Alex Markuze <amarkuze@redhat.com>
> ---
>  fs/ceph/mds_client.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index ce0c129f4651..4e5926f36e8d 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -3053,12 +3053,19 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>  	 */
>  	if (!parent_locked && req->r_parent && path_info1.vino.ino &&
>  	    ceph_ino(req->r_parent) != path_info1.vino.ino) {
> +		struct inode *old_parent = req->r_parent;
>  		struct inode *correct_dir = ceph_get_inode(mdsc->fsc->sb, path_info1.vino, NULL);
>  		if (!IS_ERR(correct_dir)) {
>  			WARN_ONCE(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updating\n",
> -				  ceph_ino(req->r_parent), path_info1.vino.ino);
> -			iput(req->r_parent);
> +			          ceph_ino(old_parent), path_info1.vino.ino);
> +			/*
> +			 * Transfer CEPH_CAP_PIN from the old parent to the new one.
> +			 * The pin was taken earlier in ceph_mdsc_submit_request().
> +			 */
> +			ceph_put_cap_refs(ceph_inode(old_parent), CEPH_CAP_PIN);
> +			iput(old_parent);
>  			req->r_parent = correct_dir;
> +			ceph_get_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN);
>  		}
>  	}
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch
  2025-09-02 18:42 ` [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch Viacheslav Dubeyko
@ 2025-09-03  8:14   ` Alex Markuze
  2025-09-03 18:41     ` Viacheslav Dubeyko
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Markuze @ 2025-09-03  8:14 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: ceph-devel@vger.kernel.org, idryomov@gmail.com,
	linux-kernel@vger.kernel.org

These patches apply to the testing branch. They update the r_parent race fix.

commit a69ac54928a45ad66b6ba84f9bd4be2fd0f9518e
Author: Alex Markuze <amarkuze@redhat.com>
Date:   Tue Aug 12 09:57:39 2025 +0000

    ceph: fix race condition where r_parent becomes stale before sending message

    When the parent directory's i_rwsem is not locked, req->r_parent may become
    stale due to concurrent operations (e.g. rename) between dentry lookup and
    message creation. Validate that r_parent matches the encoded parent inode
    and update to the correct inode if a mismatch is detected.

    Signed-off-by: Alex Markuze <amarkuze@redhat.com>
    Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

commit 7128e41a490709c759fde32898eade197acd0978
Author: Alex Markuze <amarkuze@redhat.com>
Date:   Tue Aug 12 09:57:38 2025 +0000

    ceph: fix race condition validating r_parent before applying state

    Add validation to ensure the cached parent directory inode matches the
    directory info in MDS replies. This prevents client-side race conditions
    where concurrent operations (e.g. rename) cause r_parent to become stale
    between request initiation and reply processing, which could lead to
    applying state changes to incorrect directory inodes.

    Signed-off-by: Alex Markuze <amarkuze@redhat.com>
    Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

On Tue, Sep 2, 2025 at 9:42 PM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
>
> On Mon, 2025-09-01 at 15:14 +0000, Alex Markuze wrote:
> > When the parent directory lock is not held, req->r_parent can become stale between dentry lookup and request encoding.
> > The client updates r_parent to the correct inode based on the encoded path, but previously did not adjust CEPH_CAP_PIN references.
> >
> > Release the pin from the old parent and acquire it for the new parent when switching r_parent, ensuring reference accounting stays balanced and avoiding leaks or underflows later in ceph_mdsc_release_request().
> >
>
> I cannot apply the patch on current state of the kernel. I assume that this
> patch is improvement of previous patch set. If so, then it will be better to
> send another version of previous patch set. Otherwise, it's hard to review and
> impossible to test it.
>
> Thanks,
> Slava.
>
> > Signed-off-by: Alex Markuze <amarkuze@redhat.com>
> > ---
> >  fs/ceph/mds_client.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index ce0c129f4651..4e5926f36e8d 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -3053,12 +3053,19 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >        */
> >       if (!parent_locked && req->r_parent && path_info1.vino.ino &&
> >           ceph_ino(req->r_parent) != path_info1.vino.ino) {
> > +             struct inode *old_parent = req->r_parent;
> >               struct inode *correct_dir = ceph_get_inode(mdsc->fsc->sb, path_info1.vino, NULL);
> >               if (!IS_ERR(correct_dir)) {
> >                       WARN_ONCE(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updating\n",
> > -                               ceph_ino(req->r_parent), path_info1.vino.ino);
> > -                     iput(req->r_parent);
> > +                               ceph_ino(old_parent), path_info1.vino.ino);
> > +                     /*
> > +                      * Transfer CEPH_CAP_PIN from the old parent to the new one.
> > +                      * The pin was taken earlier in ceph_mdsc_submit_request().
> > +                      */
> > +                     ceph_put_cap_refs(ceph_inode(old_parent), CEPH_CAP_PIN);
> > +                     iput(old_parent);
> >                       req->r_parent = correct_dir;
> > +                     ceph_get_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN);
> >               }
> >       }
> >


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch
  2025-09-03  8:14   ` Alex Markuze
@ 2025-09-03 18:41     ` Viacheslav Dubeyko
  0 siblings, 0 replies; 6+ messages in thread
From: Viacheslav Dubeyko @ 2025-09-03 18:41 UTC (permalink / raw)
  To: Alex Markuze
  Cc: ceph-devel@vger.kernel.org, idryomov@gmail.com,
	linux-kernel@vger.kernel.org

On Wed, 2025-09-03 at 11:14 +0300, Alex Markuze wrote:
> These patches apply to the testing branch. They update the r_parent race fix.
> 
> commit a69ac54928a45ad66b6ba84f9bd4be2fd0f9518e
> Author: Alex Markuze <amarkuze@redhat.com>
> Date:   Tue Aug 12 09:57:39 2025 +0000
> 
>     ceph: fix race condition where r_parent becomes stale before sending message
> 
>     When the parent directory's i_rwsem is not locked, req->r_parent may become
>     stale due to concurrent operations (e.g. rename) between dentry lookup and
>     message creation. Validate that r_parent matches the encoded parent inode
>     and update to the correct inode if a mismatch is detected.
> 
>     Signed-off-by: Alex Markuze <amarkuze@redhat.com>
>     Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
>     Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> 
> commit 7128e41a490709c759fde32898eade197acd0978
> Author: Alex Markuze <amarkuze@redhat.com>
> Date:   Tue Aug 12 09:57:38 2025 +0000
> 
>     ceph: fix race condition validating r_parent before applying state
> 
>     Add validation to ensure the cached parent directory inode matches the
>     directory info in MDS replies. This prevents client-side race conditions
>     where concurrent operations (e.g. rename) cause r_parent to become stale
>     between request initiation and reply processing, which could lead to
>     applying state changes to incorrect directory inodes.
> 
>     Signed-off-by: Alex Markuze <amarkuze@redhat.com>
>     Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
>     Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> 

OK. Thanks for explaining this.

> On Tue, Sep 2, 2025 at 9:42 PM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
> > 
> > On Mon, 2025-09-01 at 15:14 +0000, Alex Markuze wrote:
> > > When the parent directory lock is not held, req->r_parent can become stale between dentry lookup and request encoding.
> > > The client updates r_parent to the correct inode based on the encoded path, but previously did not adjust CEPH_CAP_PIN references.
> > > 
> > > Release the pin from the old parent and acquire it for the new parent when switching r_parent, ensuring reference accounting stays balanced and avoiding leaks or underflows later in ceph_mdsc_release_request().
> > > 
> > 

I think that it makes sense to explain in brief what is the
responsibility of CEPH_CAP_PIN and why it is important to move
the CEPH_CAP_PIN from the old parent to the new one.

> > 
> > > Signed-off-by: Alex Markuze <amarkuze@redhat.com>
> > > ---
> > >  fs/ceph/mds_client.c | 11 +++++++++--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index ce0c129f4651..4e5926f36e8d 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -3053,12 +3053,19 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > >        */
> > >       if (!parent_locked && req->r_parent && path_info1.vino.ino &&
> > >           ceph_ino(req->r_parent) != path_info1.vino.ino) {
> > > +             struct inode *old_parent = req->r_parent;
> > >               struct inode *correct_dir = ceph_get_inode(mdsc->fsc->sb, path_info1.vino, NULL);
> > >               if (!IS_ERR(correct_dir)) {
> > >                       WARN_ONCE(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updating\n",
> > > -                               ceph_ino(req->r_parent), path_info1.vino.ino);
> > > -                     iput(req->r_parent);
> > > +                               ceph_ino(old_parent), path_info1.vino.ino);
> > > +                     /*
> > > +                      * Transfer CEPH_CAP_PIN from the old parent to the new one.
> > > +                      * The pin was taken earlier in ceph_mdsc_submit_request().
> > > +                      */
> > > +                     ceph_put_cap_refs(ceph_inode(old_parent), CEPH_CAP_PIN);
> > > +                     iput(old_parent);
> > >                       req->r_parent = correct_dir;
> > > +                     ceph_get_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN);
> > >               }
> > >       }
> > > 

The patch looks good. But the commit message can be improved to be more
informative, from my point of view.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re:  [PATCH 2/2] ceph/inode: drop extra reference from ceph_get_reply_dir() in ceph_fill_trace()
  2025-09-01 15:14 ` [PATCH 2/2] ceph/inode: drop extra reference from ceph_get_reply_dir() in ceph_fill_trace() Alex Markuze
@ 2025-09-03 18:51   ` Viacheslav Dubeyko
  0 siblings, 0 replies; 6+ messages in thread
From: Viacheslav Dubeyko @ 2025-09-03 18:51 UTC (permalink / raw)
  To: Alex Markuze, ceph-devel@vger.kernel.org
  Cc: idryomov@gmail.com, linux-kernel@vger.kernel.org

On Mon, 2025-09-01 at 15:14 +0000, Alex Markuze wrote:
> ceph_get_reply_dir() may return a different, referenced inode when r_parent is stale and the parent directory lock is not held.
> ceph_fill_trace() used that inode but failed to drop the reference when it differed from req->r_parent, leaking an inode reference.
> 
> Keep the directory inode in a local and iput() it at function end if it does not match req->r_parent.
> 
> Signed-off-by: Alex Markuze <amarkuze@redhat.com>
> ---
>  fs/ceph/inode.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 470ee595ecf2..439c08ece283 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -1585,6 +1585,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
>  	struct ceph_fs_client *fsc = ceph_sb_to_fs_client(sb);
>  	struct ceph_client *cl = fsc->client;
>  	int err = 0;
> +	struct inode *dir = NULL;

Probably, we need to declare the dir pointer before err declaration. What do you
think?

>  
>  	doutc(cl, "%p is_dentry %d is_target %d\n", req,
>  	      rinfo->head->is_dentry, rinfo->head->is_target);
> @@ -1601,7 +1602,11 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
>  		 * r_parent may be stale, in cases when R_PARENT_LOCKED is not set,
>  		 * so we need to get the correct inode
>  		 */
> -		struct inode *dir = ceph_get_reply_dir(sb, req->r_parent, rinfo);
> +		dir = ceph_get_reply_dir(sb, req->r_parent, rinfo);
> +		if (IS_ERR(dir)) {
> +			err = PTR_ERR(dir);
> +			goto done;
> +		}
>  		if (dir) {
>  			err = ceph_fill_inode(dir, NULL, &rinfo->diri,
>  					      rinfo->dirfrag, session, -1,
> @@ -1869,6 +1874,9 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
>  					    &dvino, ptvino);
>  	}
>  done:
> +	/* Drop extra ref from ceph_get_reply_dir() if it returned a new inode */
> +	if (!IS_ERR(dir) && dir && dir != req->r_parent)

I think it makes sense to check the dir on NULL at first, then on error.
Maybe, we need to name dir variable as parent_dir or simply parent?

Thanks,
Slava.

> +		iput(dir);
>  	doutc(cl, "done err=%d\n", err);
>  	return err;
>  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-03 18:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-01 15:14 [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch Alex Markuze
2025-09-01 15:14 ` [PATCH 2/2] ceph/inode: drop extra reference from ceph_get_reply_dir() in ceph_fill_trace() Alex Markuze
2025-09-03 18:51   ` Viacheslav Dubeyko
2025-09-02 18:42 ` [PATCH 1/2] ceph/mds_client: transfer CEPH_CAP_PIN when updating r_parent on mismatch Viacheslav Dubeyko
2025-09-03  8:14   ` Alex Markuze
2025-09-03 18:41     ` Viacheslav Dubeyko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).