* [PATCH] SUNRPC: close a rare race in xs_tcp_setup_socket.
@ 2013-10-31 5:14 NeilBrown
2013-10-31 13:29 ` Myklebust, Trond
0 siblings, 1 reply; 2+ messages in thread
From: NeilBrown @ 2013-10-31 5:14 UTC (permalink / raw)
To: Myklebust, Trond; +Cc: NFS
[-- Attachment #1: Type: text/plain, Size: 1782 bytes --]
We have one report of a crash in xs_tcp_setup_socket.
The call path to the crash is:
xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.
The 'sock' passed to that last function is NULL.
The only way I can see this happening is a concurrent call to
xs_close:
xs_close -> xs_reset_transport -> sock_release -> inet_release
inet_release sets:
sock->sk = NULL;
inet_stream_connect calls
lock_sock(sock->sk);
which gets NULL.
All calls to xs_close are protected by XPRT_LOCKED as are most
activations of the workqueue which runs xs_tcp_setup_socket.
The exception is xs_tcp_schedule_linger_timeout.
So presumably the timeout queued by the later fires exactly when some
other code runs xs_close().
To protect against this we can move the cancel_delayed_work_sync()
call from xs_destory() to xs_close().
As xs_close is never called from the worker scheduled on
->connect_worker, this can never deadlock.
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index ee03d35677d9..b19ba535ae1a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -835,6 +835,8 @@ static void xs_close(struct rpc_xprt *xprt)
dprintk("RPC: xs_close xprt %p\n", xprt);
+ cancel_delayed_work_sync(&transport->connect_worker);
+
xs_reset_transport(transport);
xprt->reestablish_timeout = 0;
@@ -869,12 +871,8 @@ static void xs_local_destroy(struct rpc_xprt *xprt)
*/
static void xs_destroy(struct rpc_xprt *xprt)
{
- struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
-
dprintk("RPC: xs_destroy xprt %p\n", xprt);
- cancel_delayed_work_sync(&transport->connect_worker);
-
xs_local_destroy(xprt);
}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] SUNRPC: close a rare race in xs_tcp_setup_socket.
2013-10-31 5:14 [PATCH] SUNRPC: close a rare race in xs_tcp_setup_socket NeilBrown
@ 2013-10-31 13:29 ` Myklebust, Trond
0 siblings, 0 replies; 2+ messages in thread
From: Myklebust, Trond @ 2013-10-31 13:29 UTC (permalink / raw)
To: NeilBrown; +Cc: NFS
On Thu, 2013-10-31 at 16:14 +1100, NeilBrown wrote:
>
> We have one report of a crash in xs_tcp_setup_socket.
> The call path to the crash is:
>
> xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.
>
> The 'sock' passed to that last function is NULL.
>
> The only way I can see this happening is a concurrent call to
> xs_close:
>
> xs_close -> xs_reset_transport -> sock_release -> inet_release
>
> inet_release sets:
> sock->sk = NULL;
> inet_stream_connect calls
> lock_sock(sock->sk);
> which gets NULL.
>
> All calls to xs_close are protected by XPRT_LOCKED as are most
> activations of the workqueue which runs xs_tcp_setup_socket.
> The exception is xs_tcp_schedule_linger_timeout.
>
> So presumably the timeout queued by the later fires exactly when some
> other code runs xs_close().
>
> To protect against this we can move the cancel_delayed_work_sync()
> call from xs_destory() to xs_close().
>
> As xs_close is never called from the worker scheduled on
> ->connect_worker, this can never deadlock.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index ee03d35677d9..b19ba535ae1a 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -835,6 +835,8 @@ static void xs_close(struct rpc_xprt *xprt)
>
> dprintk("RPC: xs_close xprt %p\n", xprt);
>
> + cancel_delayed_work_sync(&transport->connect_worker);
> +
> xs_reset_transport(transport);
> xprt->reestablish_timeout = 0;
>
> @@ -869,12 +871,8 @@ static void xs_local_destroy(struct rpc_xprt *xprt)
> */
> static void xs_destroy(struct rpc_xprt *xprt)
> {
> - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> -
> dprintk("RPC: xs_destroy xprt %p\n", xprt);
>
> - cancel_delayed_work_sync(&transport->connect_worker);
> -
> xs_local_destroy(xprt);
> }
>
Hi Neil,
I noticed one small issue when applying: transport->connect_worker is
not initialised for the AF_LOCAL case. I therefore added a dummy
initialiser to xs_setup_local().
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-10-31 13:29 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-31 5:14 [PATCH] SUNRPC: close a rare race in xs_tcp_setup_socket NeilBrown
2013-10-31 13:29 ` Myklebust, Trond
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).