linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] SUNRPC: close a rare race in xs_tcp_setup_socket.
@ 2013-10-31  5:14 NeilBrown
  2013-10-31 13:29 ` Myklebust, Trond
  0 siblings, 1 reply; 2+ messages in thread
From: NeilBrown @ 2013-10-31  5:14 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: NFS

[-- Attachment #1: Type: text/plain, Size: 1782 bytes --]



We have one report of a crash in xs_tcp_setup_socket.
The call path to the crash is:

  xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.

The 'sock' passed to that last function is NULL.

The only way I can see this happening is a concurrent call to
xs_close:

  xs_close -> xs_reset_transport -> sock_release -> inet_release

inet_release sets:
   sock->sk = NULL;
inet_stream_connect calls
   lock_sock(sock->sk);
which gets NULL.

All calls to xs_close are protected by XPRT_LOCKED as are most
activations of the workqueue which runs xs_tcp_setup_socket.
The exception is xs_tcp_schedule_linger_timeout.

So presumably the timeout queued by the later fires exactly when some
other code runs xs_close().

To protect against this we can move the cancel_delayed_work_sync()
call from xs_destory() to xs_close().

As xs_close is never called from the worker scheduled on
->connect_worker, this can never deadlock.

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index ee03d35677d9..b19ba535ae1a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -835,6 +835,8 @@ static void xs_close(struct rpc_xprt *xprt)
 
 	dprintk("RPC:       xs_close xprt %p\n", xprt);
 
+	cancel_delayed_work_sync(&transport->connect_worker);
+
 	xs_reset_transport(transport);
 	xprt->reestablish_timeout = 0;
 
@@ -869,12 +871,8 @@ static void xs_local_destroy(struct rpc_xprt *xprt)
  */
 static void xs_destroy(struct rpc_xprt *xprt)
 {
-	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
-
 	dprintk("RPC:       xs_destroy xprt %p\n", xprt);
 
-	cancel_delayed_work_sync(&transport->connect_worker);
-
 	xs_local_destroy(xprt);
 }
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] SUNRPC: close a rare race in xs_tcp_setup_socket.
  2013-10-31  5:14 [PATCH] SUNRPC: close a rare race in xs_tcp_setup_socket NeilBrown
@ 2013-10-31 13:29 ` Myklebust, Trond
  0 siblings, 0 replies; 2+ messages in thread
From: Myklebust, Trond @ 2013-10-31 13:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: NFS

On Thu, 2013-10-31 at 16:14 +1100, NeilBrown wrote:
> 
> We have one report of a crash in xs_tcp_setup_socket.
> The call path to the crash is:
> 
>   xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.
> 
> The 'sock' passed to that last function is NULL.
> 
> The only way I can see this happening is a concurrent call to
> xs_close:
> 
>   xs_close -> xs_reset_transport -> sock_release -> inet_release
> 
> inet_release sets:
>    sock->sk = NULL;
> inet_stream_connect calls
>    lock_sock(sock->sk);
> which gets NULL.
> 
> All calls to xs_close are protected by XPRT_LOCKED as are most
> activations of the workqueue which runs xs_tcp_setup_socket.
> The exception is xs_tcp_schedule_linger_timeout.
> 
> So presumably the timeout queued by the later fires exactly when some
> other code runs xs_close().
> 
> To protect against this we can move the cancel_delayed_work_sync()
> call from xs_destory() to xs_close().
> 
> As xs_close is never called from the worker scheduled on
> ->connect_worker, this can never deadlock.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index ee03d35677d9..b19ba535ae1a 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -835,6 +835,8 @@ static void xs_close(struct rpc_xprt *xprt)
>  
>  	dprintk("RPC:       xs_close xprt %p\n", xprt);
>  
> +	cancel_delayed_work_sync(&transport->connect_worker);
> +
>  	xs_reset_transport(transport);
>  	xprt->reestablish_timeout = 0;
>  
> @@ -869,12 +871,8 @@ static void xs_local_destroy(struct rpc_xprt *xprt)
>   */
>  static void xs_destroy(struct rpc_xprt *xprt)
>  {
> -	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> -
>  	dprintk("RPC:       xs_destroy xprt %p\n", xprt);
>  
> -	cancel_delayed_work_sync(&transport->connect_worker);
> -
>  	xs_local_destroy(xprt);
>  }
>  

Hi Neil,

I noticed one small issue when applying: transport->connect_worker is
not initialised for the AF_LOCAL case. I therefore added a dummy
initialiser to xs_setup_local().

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-10-31 13:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-31  5:14 [PATCH] SUNRPC: close a rare race in xs_tcp_setup_socket NeilBrown
2013-10-31 13:29 ` Myklebust, Trond

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).