Linux NFS development
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@kernel.org>
To: Calum Mackay <calum.mackay@oracle.com>, anna.schumaker@oracle.com
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] sunrpc: hold RPC client while gss_auth has a link to it
Date: Wed, 27 Aug 2025 16:33:06 -0700	[thread overview]
Message-ID: <6813623b0780828d7e2a6dced04f74f803423f20.camel@kernel.org> (raw)
In-Reply-To: <20250827223831.47513-1-calum.mackay@oracle.com>

On Wed, 2025-08-27 at 23:38 +0100, Calum Mackay wrote:
> We have seen crashes where the RPC client has been freed, while a
> gss_auth
> still has a pointer to it. Subsequently, GSS attempts to send a NULL
> call,
> resulting in a use-after-free in xprt_iter_get_next.
> 
> Fix that by having GSS take a reference on the RPC client, in
> gss_create_new, and release it in gss_free.
> 
> The crashes we've seen have been on a v5.4-based kernel. They are not
> reproducible on demand, happening once every few months under heavy
> load.
> 
> The crashing thread is an rpc_async_release worker:
> 
> xprt_iter_ops (net/sunrpc/xprtmultipath.c:184:43)
> xprt_iter_get_next (net/sunrpc/xprtmultipath.c:527:35)
> rpc_task_get_next_xprt (net/sunrpc/clnt.c:1062:9)
> rpc_task_set_transport (net/sunrpc/clnt.c:1073:19)
> rpc_task_set_transport (net/sunrpc/clnt.c:1066:6)
> rpc_task_set_client (net/sunrpc/clnt.c:1081:3)
> rpc_run_task (net/sunrpc/clnt.c:1133:2)
> rpc_call_null_helper (net/sunrpc/clnt.c:2766:9)
> rpc_call_null (net/sunrpc/clnt.c:2771:9)
> gss_send_destroy_context (net/sunrpc/auth_gss/auth_gss.c:1274:10)
> gss_destroy_cred (net/sunrpc/auth_gss/auth_gss.c:1341:3)
> put_rpccred (net/sunrpc/auth.c:758:2)
> put_rpccred (net/sunrpc/auth.c:733:1)
> __put_nfs_open_context (fs/nfs/inode.c:1010:2)
> put_nfs_open_context (fs/nfs/inode.c:1017:2)
> nfs_pgio_data_destroy (fs/nfs/pagelist.c:562:3)
> nfs_pgio_header_free (fs/nfs/pagelist.c:573:2)
> nfs_read_completion (fs/nfs/read.c:200:2)
> nfs_pgio_release (fs/nfs/pagelist.c:699:2)
> rpc_release_calldata (net/sunrpc/sched.c:890:3)
> rpc_free_task (net/sunrpc/sched.c:1171:2)
> rpc_async_release (net/sunrpc/sched.c:1183:2)
> process_one_work (kernel/workqueue.c:2295:2)
> worker_thread (kernel/workqueue.c:2448:4)
> kthread (kernel/kthread.c:296:9)
> ret_from_fork+0x2b/0x36 (arch/x86/entry/entry_64.S:358)
> 
> Immediately before __put_nfs_open_context calls put_rpccred, above,
> it calls nfs_sb_deactive, which frees the RPC client:
> 
> rpc_free_client+189 [sunrpc]
> rpc_release_client+98 [sunrpc]
> rpc_shutdown_client+98 [sunrpc]
> nfs_free_client+123 [nfs]
> nfs_put_client+217 [nfs]
> nfs_free_server+81 [nfs]
> nfs_kill_super+49 [nfs]
> deactivate_locked_super+58
> deactivate_super+89
> nfs_sb_deactive+36 [nfs]
> 
> After the RPC client is freed here, __put_nfs_open_context calls
> put_rpccred, which wants to destroy the cred, since its refcnt is now
> zero. Since the cred is not marked as UPTODATE,
> gss_send_destroy_context
> needs to send a NULL call to the server, to get it to release all
> state
> for this context.  For this NULL call, we need an RPC client with an
> associated xprt; whilst looking for one, we trip over the freed xpi,
> that we freed earlier from nfs_sb_deactive.
> 
> Ensuring that the RPC client refcnt is incremented whilst gss_auth
> has a
> pointer to it would ensure that the client can't be freed until
> gss_auth
> has been freed.
> 
> Whilst the above occurred on a v5.4 kernel, I don't see anything
> obvious
> that would stop this happening to current code.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
> ---
>  net/sunrpc/auth_gss/auth_gss.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/auth_gss/auth_gss.c
> b/net/sunrpc/auth_gss/auth_gss.c
> index 5c095cb8cb20..8c2cc92c6cd6 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -1026,6 +1026,13 @@ gss_create_new(const struct
> rpc_auth_create_args *args, struct rpc_clnt *clnt)
>  
>  	if (!try_module_get(THIS_MODULE))
>  		return ERR_PTR(err);
> +
> +	/*
> +	 * Make sure the RPC client can't be freed while gss_auth
> has
> +	 * a link to it
> +	 */
> +	refcount_inc(&clnt->cl_count);
> +

NACK. We can't have the client hold a reference to the auth_gss struct
and then have the same auth_gss hold a reference back to the client.
That's a guaranteed leak of both objects.


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

  reply	other threads:[~2025-08-27 23:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27 22:38 [PATCH] sunrpc: hold RPC client while gss_auth has a link to it Calum Mackay
2025-08-27 23:33 ` Trond Myklebust [this message]
2025-08-27 23:48   ` Calum Mackay
2025-08-28  0:04     ` Trond Myklebust
2025-08-28  0:17       ` Calum Mackay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6813623b0780828d7e2a6dced04f74f803423f20.camel@kernel.org \
    --to=trondmy@kernel.org \
    --cc=anna.schumaker@oracle.com \
    --cc=calum.mackay@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox