* BUG NULL pointer dereference in SUNRPC xs_udp_send_request
@ 2009-02-23 20:11 ` Aaron Straus
0 siblings, 0 replies; 15+ messages in thread
From: Aaron Straus @ 2009-02-23 20:11 UTC (permalink / raw)
To: bfields, neilb, linux-nfs, linux-kernel, Trond.Myklebust
Hi,
We received the trace below on one of our machines this weekend. The
machine is running vanilla 2.6.27.14.
If I'm reading the trace correctly, it looks like this line of
xs_udp_send_request:
clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
The machine is x86 (32-bit).
Please let me know if you need anything else e.g .config or full
dmesg.
Thanks for your time.
=a=
--
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f89fbf5e>] :sunrpc:xs_udp_send_request+0xa3/0xca
*pdpt = 0000000035b6d001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: nfs lockd autofs4 sunrpc ohci_hcd usbcore tg3 libphy [last unloaded: x_tables]
Pid: 1870, comm: mount Not tainted (2.6.27.14-asb #104)
EIP: 0060:[<f89fbf5e>] EFLAGS: 00010202 CPU: 2
EIP is at xs_udp_send_request+0xa3/0xca [sunrpc]
EAX: 00000000 EBX: ffffff95 ECX: 00000010 EDX: 00000000
ESI: f58ba000 EDI: f58f9000 EBP: f78df800 ESP: c66d1a80
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mount (pid: 1870, ti=c66d0000 task=f78d4a20 task.ti=c66d0000)
Stack: f58ba004 f58ba000 f58ba054 f58ba0b4 f58f9000 f89f981a f78df800 f58f9290
f58ba074 c66d1b64 f58ba000 f78df800 f78df848 f89f83a3 f78df848 f78df800
00000000 f89fcbe2 f78df800 c66d1ae8 c66d1b04 c66d1bc0 f89f78a9 f63bf800
Call Trace:
[<f89f981a>] xprt_transmit+0xd3/0x1ad [sunrpc]
[<f89f83a3>] call_transmit+0x1b0/0x1e6 [sunrpc]
[<f89fcbe2>] __rpc_execute+0x63/0x1e1 [sunrpc]
[<f89f78a9>] rpc_run_task+0x3e/0x45 [sunrpc]
[<f89f7939>] rpc_call_sync+0x38/0x52 [sunrpc]
[<f8a02408>] rpcb_register_call+0x85/0xc7 [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89fe43e>] svc_register+0xb0/0x12a [sunrpc]
[<f89ff7b2>] svc_setup_socket+0x6b/0x242 [sunrpc]
[<f89ffbde>] svc_create_socket+0x255/0x2bd [sunrpc]
[<f89f7b5d>] rpc_shutdown_client+0xa2/0xaa [sunrpc]
[<f8a02411>] rpcb_register_call+0x8e/0xc7 [sunrpc]
[<f89fc7d4>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89ffc57>] svc_tcp_create+0x11/0x14 [sunrpc]
[<f8a07055>] svc_create_xprt+0xd8/0x18c [sunrpc]
[<f8891f45>] make_socks+0x85/0xc2 [lockd]
[<f889216f>] lockd_up+0x80/0x145 [lockd]
[<f8890384>] nlmclnt_init+0x1c/0x4e [lockd]
[<f8a1d80b>] nfs_start_lockd+0x5f/0x7d [nfs]
[<f8a1e1dc>] nfs_create_server+0x6d9/0x9bd [nfs]
[<c02a9cd7>] sock_common_recvmsg+0x2f/0x45
[<c02a8784>] sock_recvmsg+0xc8/0xe3
[<c02a8857>] sock_sendmsg+0xb8/0xd1
[<c0114407>] kmap_atomic+0x11/0x14
[<c0143fa3>] get_page_from_freelist+0x352/0x3ca
[<c01bf5b6>] idr_get_empty_slot+0x150/0x225
[<f8a25518>] nfs_get_sb+0x651/0x87d [nfs]
[<c0148e3a>] kstrdup+0x27/0x48
[<c015f9cb>] vfs_kern_mount+0x39/0x72
[<c015fa42>] do_kern_mount+0x2f/0xb4
[<c017166f>] do_new_mount+0x55/0x89
[<c017181f>] do_mount+0x17c/0x19b
[<c030e3da>] error_code+0x72/0x78
[<c016fa36>] copy_mount_options+0x78/0x10d
[<c01718ab>] sys_mount+0x6d/0xaa
[<c0102cb9>] sysenter_do_call+0x12/0x25
=======================
Code: 83 fb e0 74 13 83 fb f5 75 1b 89 e8 e8 6c e7 ff ff bb f5 ff ff ff eb 32 8b 87 f4 02 00 00 f0 80 60 08 fe eb 25 8b 87 f4 02 00 00 <f0> 80 60 08 fe 80 3d 68 b8 a1 f8 00 79 11 89 d8 f7 d8 50 68 fa
EIP: [<f89fbf5e>] xs_udp_send_request+0xa3/0xca [sunrpc] SS:ESP 0068:c66d1a80
---[ end trace a09733af92e06ec6 ]---
--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* BUG NULL pointer dereference in SUNRPC xs_udp_send_request
@ 2009-02-23 20:11 ` Aaron Straus
0 siblings, 0 replies; 15+ messages in thread
From: Aaron Straus @ 2009-02-23 20:11 UTC (permalink / raw)
To: bfields, neilb, linux-nfs, linux-kernel, Trond.Myklebust
Hi,
We received the trace below on one of our machines this weekend. The
machine is running vanilla 2.6.27.14.
If I'm reading the trace correctly, it looks like this line of
xs_udp_send_request:
clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
The machine is x86 (32-bit).
Please let me know if you need anything else e.g .config or full
dmesg.
Thanks for your time.
=a=
--
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f89fbf5e>] :sunrpc:xs_udp_send_request+0xa3/0xca
*pdpt = 0000000035b6d001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: nfs lockd autofs4 sunrpc ohci_hcd usbcore tg3 libphy [last unloaded: x_tables]
Pid: 1870, comm: mount Not tainted (2.6.27.14-asb #104)
EIP: 0060:[<f89fbf5e>] EFLAGS: 00010202 CPU: 2
EIP is at xs_udp_send_request+0xa3/0xca [sunrpc]
EAX: 00000000 EBX: ffffff95 ECX: 00000010 EDX: 00000000
ESI: f58ba000 EDI: f58f9000 EBP: f78df800 ESP: c66d1a80
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mount (pid: 1870, ti=c66d0000 task=f78d4a20 task.ti=c66d0000)
Stack: f58ba004 f58ba000 f58ba054 f58ba0b4 f58f9000 f89f981a f78df800 f58f9290
f58ba074 c66d1b64 f58ba000 f78df800 f78df848 f89f83a3 f78df848 f78df800
00000000 f89fcbe2 f78df800 c66d1ae8 c66d1b04 c66d1bc0 f89f78a9 f63bf800
Call Trace:
[<f89f981a>] xprt_transmit+0xd3/0x1ad [sunrpc]
[<f89f83a3>] call_transmit+0x1b0/0x1e6 [sunrpc]
[<f89fcbe2>] __rpc_execute+0x63/0x1e1 [sunrpc]
[<f89f78a9>] rpc_run_task+0x3e/0x45 [sunrpc]
[<f89f7939>] rpc_call_sync+0x38/0x52 [sunrpc]
[<f8a02408>] rpcb_register_call+0x85/0xc7 [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89fe43e>] svc_register+0xb0/0x12a [sunrpc]
[<f89ff7b2>] svc_setup_socket+0x6b/0x242 [sunrpc]
[<f89ffbde>] svc_create_socket+0x255/0x2bd [sunrpc]
[<f89f7b5d>] rpc_shutdown_client+0xa2/0xaa [sunrpc]
[<f8a02411>] rpcb_register_call+0x8e/0xc7 [sunrpc]
[<f89fc7d4>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89ffc57>] svc_tcp_create+0x11/0x14 [sunrpc]
[<f8a07055>] svc_create_xprt+0xd8/0x18c [sunrpc]
[<f8891f45>] make_socks+0x85/0xc2 [lockd]
[<f889216f>] lockd_up+0x80/0x145 [lockd]
[<f8890384>] nlmclnt_init+0x1c/0x4e [lockd]
[<f8a1d80b>] nfs_start_lockd+0x5f/0x7d [nfs]
[<f8a1e1dc>] nfs_create_server+0x6d9/0x9bd [nfs]
[<c02a9cd7>] sock_common_recvmsg+0x2f/0x45
[<c02a8784>] sock_recvmsg+0xc8/0xe3
[<c02a8857>] sock_sendmsg+0xb8/0xd1
[<c0114407>] kmap_atomic+0x11/0x14
[<c0143fa3>] get_page_from_freelist+0x352/0x3ca
[<c01bf5b6>] idr_get_empty_slot+0x150/0x225
[<f8a25518>] nfs_get_sb+0x651/0x87d [nfs]
[<c0148e3a>] kstrdup+0x27/0x48
[<c015f9cb>] vfs_kern_mount+0x39/0x72
[<c015fa42>] do_kern_mount+0x2f/0xb4
[<c017166f>] do_new_mount+0x55/0x89
[<c017181f>] do_mount+0x17c/0x19b
[<c030e3da>] error_code+0x72/0x78
[<c016fa36>] copy_mount_options+0x78/0x10d
[<c01718ab>] sys_mount+0x6d/0xaa
[<c0102cb9>] sysenter_do_call+0x12/0x25
=======================
Code: 83 fb e0 74 13 83 fb f5 75 1b 89 e8 e8 6c e7 ff ff bb f5 ff ff ff eb 32 8b 87 f4 02 00 00 f0 80 60 08 fe eb 25 8b 87 f4 02 00 00 <f0> 80 60 08 fe 80 3d 68 b8 a1 f8 00 79 11 89 d8 f7 d8 50 68 fa
EIP: [<f89fbf5e>] xs_udp_send_request+0xa3/0xca [sunrpc] SS:ESP 0068:c66d1a80
---[ end trace a09733af92e06ec6 ]---
--
===================
Aaron Straus
aaron@merfinllc.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG NULL pointer dereference in SUNRPC xs_udp_send_request
2009-02-23 20:11 ` Aaron Straus
@ 2009-02-25 2:39 ` Ben Myers
-1 siblings, 0 replies; 15+ messages in thread
From: Ben Myers @ 2009-02-25 2:39 UTC (permalink / raw)
To: Aaron Straus; +Cc: bfields, neilb, linux-nfs, linux-kernel, Trond.Myklebust
Hi Aaron,
On Mon, Feb 23, 2009 at 12:11:09PM -0800, Aaron Straus wrote:
> We received the trace below on one of our machines this weekend. The
> machine is running vanilla 2.6.27.14.
>
> If I'm reading the trace correctly, it looks like this line of
> xs_udp_send_request:
>
> clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
>
> The machine is x86 (32-bit).
>
> Please let me know if you need anything else e.g .config or full
> dmesg.
That's a coincidence. I looked at a similar bug today that crashed on
the same line but a different stack. My suggestion is:
Index: linux/net/sunrpc/xprtsock.c
===================================================================
--- linux.orig/net/sunrpc/xprtsock.c
+++ linux/net/sunrpc/xprtsock.c
@@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
sk->sk_no_check = UDP_CSUM_NORCV;
sk->sk_allocation = GFP_ATOMIC;
- xprt_set_connected(xprt);
-
/* Reset to new socket */
transport->sock = sock;
transport->inet = sk;
xs_set_memalloc(xprt);
+ xprt_set_connected(xprt);
write_unlock_bh(&sk->sk_callback_lock);
}
xs_udp_do_set_buffer_size(xprt);
Looks like xs_sendpages() returned -ENOTCONN. The above should sort
that out by returning earlier in xprt_prepare_transmit() and the rpc
would be retried by __rpc_execute().
-ben
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG NULL pointer dereference in SUNRPC xs_udp_send_request
@ 2009-02-25 2:39 ` Ben Myers
0 siblings, 0 replies; 15+ messages in thread
From: Ben Myers @ 2009-02-25 2:39 UTC (permalink / raw)
To: Aaron Straus; +Cc: bfields, neilb, linux-nfs, linux-kernel, Trond.Myklebust
Hi Aaron,
On Mon, Feb 23, 2009 at 12:11:09PM -0800, Aaron Straus wrote:
> We received the trace below on one of our machines this weekend. The
> machine is running vanilla 2.6.27.14.
>
> If I'm reading the trace correctly, it looks like this line of
> xs_udp_send_request:
>
> clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
>
> The machine is x86 (32-bit).
>
> Please let me know if you need anything else e.g .config or full
> dmesg.
That's a coincidence. I looked at a similar bug today that crashed on
the same line but a different stack. My suggestion is:
Index: linux/net/sunrpc/xprtsock.c
===================================================================
--- linux.orig/net/sunrpc/xprtsock.c
+++ linux/net/sunrpc/xprtsock.c
@@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
sk->sk_no_check = UDP_CSUM_NORCV;
sk->sk_allocation = GFP_ATOMIC;
- xprt_set_connected(xprt);
-
/* Reset to new socket */
transport->sock = sock;
transport->inet = sk;
xs_set_memalloc(xprt);
+ xprt_set_connected(xprt);
write_unlock_bh(&sk->sk_callback_lock);
}
xs_udp_do_set_buffer_size(xprt);
Looks like xs_sendpages() returned -ENOTCONN. The above should sort
that out by returning earlier in xprt_prepare_transmit() and the rpc
would be retried by __rpc_execute().
-ben
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG NULL pointer dereference in SUNRPC xs_udp_send_request
2009-02-25 2:39 ` Ben Myers
@ 2009-02-26 0:17 ` Aaron Straus
-1 siblings, 0 replies; 15+ messages in thread
From: Aaron Straus @ 2009-02-26 0:17 UTC (permalink / raw)
To: Ben Myers; +Cc: bfields, neilb, linux-nfs, linux-kernel, Trond.Myklebust
Hi Ben,
Thanks for the response.
On Feb 24 08:39 PM, Ben Myers wrote:
> > If I'm reading the trace correctly, it looks like this line of
> > xs_udp_send_request:
> >
> > clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
>
> That's a coincidence. I looked at a similar bug today that crashed on
> the same line but a different stack. My suggestion is:
>
> Index: linux/net/sunrpc/xprtsock.c
> ===================================================================
> --- linux.orig/net/sunrpc/xprtsock.c
> +++ linux/net/sunrpc/xprtsock.c
> @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> sk->sk_no_check = UDP_CSUM_NORCV;
> sk->sk_allocation = GFP_ATOMIC;
>
> - xprt_set_connected(xprt);
> -
> /* Reset to new socket */
> transport->sock = sock;
> transport->inet = sk;
>
> xs_set_memalloc(xprt);
>
> + xprt_set_connected(xprt);
> write_unlock_bh(&sk->sk_callback_lock);
> }
> xs_udp_do_set_buffer_size(xprt);
>
> Looks like xs_sendpages() returned -ENOTCONN. The above should sort
> that out by returning earlier in xprt_prepare_transmit() and the rpc
> would be retried by __rpc_execute().
I'll start running with it tonight to see if I can trigger the BUG
again (it was hard to hit).
Quick question, do we need a barrier between setting the transport->sock
and the xprt_set_connected(xprt)? I don't really understand the locking
on the reader side, so I cannot say...
Also, out of curiosity, do you know what changed to introduce the BUG?
Kerneloops doesn't seem to know about it before 2.6.26.3:
http://www.kerneloops.org/search.php?search=xs_udp_send_request&btnG=Function+Search
Anyway, thanks!
=a=
--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: BUG NULL pointer dereference in SUNRPC xs_udp_send_request
@ 2009-02-26 0:17 ` Aaron Straus
0 siblings, 0 replies; 15+ messages in thread
From: Aaron Straus @ 2009-02-26 0:17 UTC (permalink / raw)
To: Ben Myers; +Cc: bfields, neilb, linux-nfs, linux-kernel, Trond.Myklebust
Hi Ben,
Thanks for the response.
On Feb 24 08:39 PM, Ben Myers wrote:
> > If I'm reading the trace correctly, it looks like this line of
> > xs_udp_send_request:
> >
> > clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
>
> That's a coincidence. I looked at a similar bug today that crashed on
> the same line but a different stack. My suggestion is:
>
> Index: linux/net/sunrpc/xprtsock.c
> ===================================================================
> --- linux.orig/net/sunrpc/xprtsock.c
> +++ linux/net/sunrpc/xprtsock.c
> @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> sk->sk_no_check = UDP_CSUM_NORCV;
> sk->sk_allocation = GFP_ATOMIC;
>
> - xprt_set_connected(xprt);
> -
> /* Reset to new socket */
> transport->sock = sock;
> transport->inet = sk;
>
> xs_set_memalloc(xprt);
>
> + xprt_set_connected(xprt);
> write_unlock_bh(&sk->sk_callback_lock);
> }
> xs_udp_do_set_buffer_size(xprt);
>
> Looks like xs_sendpages() returned -ENOTCONN. The above should sort
> that out by returning earlier in xprt_prepare_transmit() and the rpc
> would be retried by __rpc_execute().
I'll start running with it tonight to see if I can trigger the BUG
again (it was hard to hit).
Quick question, do we need a barrier between setting the transport->sock
and the xprt_set_connected(xprt)? I don't really understand the locking
on the reader side, so I cannot say...
Also, out of curiosity, do you know what changed to introduce the BUG?
Kerneloops doesn't seem to know about it before 2.6.26.3:
http://www.kerneloops.org/search.php?search=xs_udp_send_request&btnG=Function+Search
Anyway, thanks!
=a=
--
===================
Aaron Straus
aaron@merfinllc.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] sunrpc: xprt is not connected until after sock is set
[not found] ` <20090226001744.GB7613-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
@ 2009-02-27 23:54 ` Ben Myers
2009-02-28 0:37 ` Trond Myklebust
0 siblings, 1 reply; 15+ messages in thread
From: Ben Myers @ 2009-02-27 23:54 UTC (permalink / raw)
To: Aaron Straus; +Cc: bfields, linux-nfs, neilb, Trond.Myklebust
Hi Aaron,
I'm just getting back to this.
On Wed, Feb 25, 2009 at 04:17:45PM -0800, Aaron Straus wrote:
> Quick question, do we need a barrier between setting the transport->sock
> and the xprt_set_connected(xprt)?
Yeah, looks that way. That was some interesting reading. Here is the
patch as it stands now. Hopefully I got that right. This has
apparently fixed the problem on ia64 even with out the barriers. rpciod
racing with a write.
> Also, out of curiosity, do you know what changed to introduce the BUG?
I haven't looked into that.
Bruce, can you take this patch? I understand that you're the
maintainer?
Thanks!
Ben
xs_sendpages() returned -ENOTCONN to xs_udp_send_request() and we tried to
clear a bit in transport->sock->flags when sock hadn't been set. With this
change we will instead return earlier in xprt_prepare_transmit() and the rpc
will be retried.
---
include/linux/sunrpc/xprt.h | 9 ++++++++-
net/sunrpc/xprtsock.c | 3 +--
2 files changed, 9 insertions(+), 3 deletions(-)
Index: linux-2.6.27.15-2/include/linux/sunrpc/xprt.h
===================================================================
--- linux-2.6.27.15-2.orig/include/linux/sunrpc/xprt.h
+++ linux-2.6.27.15-2/include/linux/sunrpc/xprt.h
@@ -266,17 +266,24 @@ int xs_swapper(struct rpc_xprt *xprt,
static inline void xprt_set_connected(struct rpc_xprt *xprt)
{
+ smp_wmb();
set_bit(XPRT_CONNECTED, &xprt->state);
}
static inline void xprt_clear_connected(struct rpc_xprt *xprt)
{
clear_bit(XPRT_CONNECTED, &xprt->state);
+ smp_wmb();
}
static inline int xprt_connected(struct rpc_xprt *xprt)
{
- return test_bit(XPRT_CONNECTED, &xprt->state);
+ int connected;
+
+ connected = test_bit(XPRT_CONNECTED, &xprt->state);
+ smp_rmb();
+
+ return connected;
}
static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
===================================================================
--- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
+++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
@@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
sk->sk_no_check = UDP_CSUM_NORCV;
sk->sk_allocation = GFP_ATOMIC;
- xprt_set_connected(xprt);
-
/* Reset to new socket */
transport->sock = sock;
transport->inet = sk;
xs_set_memalloc(xprt);
+ xprt_set_connected(xprt);
write_unlock_bh(&sk->sk_callback_lock);
}
xs_udp_do_set_buffer_size(xprt);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sunrpc: xprt is not connected until after sock is set
2009-02-27 23:54 ` [PATCH] sunrpc: xprt is not connected until after sock is set Ben Myers
@ 2009-02-28 0:37 ` Trond Myklebust
[not found] ` <1235781463.20549.33.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2009-02-28 0:37 UTC (permalink / raw)
To: Ben Myers; +Cc: Aaron Straus, bfields, linux-nfs, neilb
On Fri, 2009-02-27 at 17:54 -0600, Ben Myers wrote:
> Hi Aaron,
>
> I'm just getting back to this.
>
> On Wed, Feb 25, 2009 at 04:17:45PM -0800, Aaron Straus wrote:
> > Quick question, do we need a barrier between setting the transport->sock
> > and the xprt_set_connected(xprt)?
>
> Yeah, looks that way. That was some interesting reading. Here is the
> patch as it stands now. Hopefully I got that right. This has
> apparently fixed the problem on ia64 even with out the barriers. rpciod
> racing with a write.
>
> > Also, out of curiosity, do you know what changed to introduce the BUG?
>
> I haven't looked into that.
>
> Bruce, can you take this patch? I understand that you're the
> maintainer?
>
> Thanks!
> Ben
>
> xs_sendpages() returned -ENOTCONN to xs_udp_send_request() and we tried to
> clear a bit in transport->sock->flags when sock hadn't been set. With this
> change we will instead return earlier in xprt_prepare_transmit() and the rpc
> will be retried.
> ---
> include/linux/sunrpc/xprt.h | 9 ++++++++-
> net/sunrpc/xprtsock.c | 3 +--
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> Index: linux-2.6.27.15-2/include/linux/sunrpc/xprt.h
> ===================================================================
> --- linux-2.6.27.15-2.orig/include/linux/sunrpc/xprt.h
> +++ linux-2.6.27.15-2/include/linux/sunrpc/xprt.h
> @@ -266,17 +266,24 @@ int xs_swapper(struct rpc_xprt *xprt,
>
> static inline void xprt_set_connected(struct rpc_xprt *xprt)
> {
> + smp_wmb();
> set_bit(XPRT_CONNECTED, &xprt->state);
> }
>
> static inline void xprt_clear_connected(struct rpc_xprt *xprt)
> {
> clear_bit(XPRT_CONNECTED, &xprt->state);
> + smp_wmb();
> }
>
> static inline int xprt_connected(struct rpc_xprt *xprt)
> {
> - return test_bit(XPRT_CONNECTED, &xprt->state);
> + int connected;
> +
> + connected = test_bit(XPRT_CONNECTED, &xprt->state);
> + smp_rmb();
> +
> + return connected;
> }
NACK. If you need a memory barrier, put it in the UDP case only (and
justify why). The TCP case sets and clears the above in the
connect/disconnect softirqs and so does not require them.
I certainly see no reason whatsoever for adding memory barriers to
xprt_connected() or xprt_clear_connected().
> static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> ===================================================================
> --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> sk->sk_no_check = UDP_CSUM_NORCV;
> sk->sk_allocation = GFP_ATOMIC;
>
> - xprt_set_connected(xprt);
> -
> /* Reset to new socket */
> transport->sock = sock;
> transport->inet = sk;
>
> xs_set_memalloc(xprt);
>
> + xprt_set_connected(xprt);
> write_unlock_bh(&sk->sk_callback_lock);
> }
> xs_udp_do_set_buffer_size(xprt);
Please move that call to xprt_set_connected() outside the if statement.
We want to set the connected flag unconditionally here.
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sunrpc: xprt is not connected until after sock is set
[not found] ` <1235781463.20549.33.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-02-28 1:34 ` Aaron Straus
[not found] ` <20090228013457.GF7706-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Aaron Straus @ 2009-02-28 1:34 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Ben Myers, bfields, linux-nfs, neilb
Hi Trond,
On Feb 27 07:37 PM, Trond Myklebust wrote:
> NACK. If you need a memory barrier, put it in the UDP case only (and
> justify why). The TCP case sets and clears the above in the
> connect/disconnect softirqs and so does not require them.
>
> I certainly see no reason whatsoever for adding memory barriers to
> xprt_connected() or xprt_clear_connected().
>
> > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > ===================================================================
> > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> > sk->sk_no_check = UDP_CSUM_NORCV;
> > sk->sk_allocation = GFP_ATOMIC;
> >
> > - xprt_set_connected(xprt);
> > -
> > /* Reset to new socket */
> > transport->sock = sock;
> > transport->inet = sk;
> >
> > xs_set_memalloc(xprt);
> >
> > + xprt_set_connected(xprt);
> > write_unlock_bh(&sk->sk_callback_lock);
> > }
> > xs_udp_do_set_buffer_size(xprt);
>
> Please move that call to xprt_set_connected() outside the if statement.
> We want to set the connected flag unconditionally here.
FYI, I've been digging a bit. This commit adds the clearing of the
bits:
commit b6ddf64ffe9d59577a9176856bb6fe69a539f573
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Thu Apr 17 18:52:19 2008 -0400
SUNRPC: Fix up xprt_write_space()
<snip>
@@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
}
switch (status) {
+ case -EAGAIN:
+ xs_nospace(task);
+ break;
case -ENETUNREACH:
case -EPIPE:
case -ECONNREFUSED:
/* When the server has died, an ICMP port unreachable message
* prompts ECONNREFUSED. */
- break;
- case -EAGAIN:
- xs_nospace(task);
+ clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
break;
default:
+ clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
dprintk("RPC: sendmsg returned unrecognized error %d\n",
-status);
- break;
}
return status;
This went in to 2.6.26 which was the first time we saw the bug
(2.6.26.3) on kerneloops.
I don't know if *this* is a bad commit or some other locking changed in
2.6.26 which tickles the bug.
Hope it helps. Have a good weekend all!
=a=
--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sunrpc: xprt is not connected until after sock is set
[not found] ` <20090228013457.GF7706-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
@ 2009-02-28 1:40 ` Trond Myklebust
[not found] ` <1235785237.20549.51.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2009-02-28 1:40 UTC (permalink / raw)
To: Aaron Straus; +Cc: Ben Myers, bfields, linux-nfs, neilb
On Fri, 2009-02-27 at 17:34 -0800, Aaron Straus wrote:
> Hi Trond,
>
> On Feb 27 07:37 PM, Trond Myklebust wrote:
> > NACK. If you need a memory barrier, put it in the UDP case only (and
> > justify why). The TCP case sets and clears the above in the
> > connect/disconnect softirqs and so does not require them.
> >
> > I certainly see no reason whatsoever for adding memory barriers to
> > xprt_connected() or xprt_clear_connected().
> >
> > > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> > > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > ===================================================================
> > > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> > > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> > > sk->sk_no_check = UDP_CSUM_NORCV;
> > > sk->sk_allocation = GFP_ATOMIC;
> > >
> > > - xprt_set_connected(xprt);
> > > -
> > > /* Reset to new socket */
> > > transport->sock = sock;
> > > transport->inet = sk;
> > >
> > > xs_set_memalloc(xprt);
> > >
> > > + xprt_set_connected(xprt);
> > > write_unlock_bh(&sk->sk_callback_lock);
> > > }
> > > xs_udp_do_set_buffer_size(xprt);
> >
> > Please move that call to xprt_set_connected() outside the if statement.
> > We want to set the connected flag unconditionally here.
>
> FYI, I've been digging a bit. This commit adds the clearing of the
> bits:
>
> commit b6ddf64ffe9d59577a9176856bb6fe69a539f573
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Thu Apr 17 18:52:19 2008 -0400
>
> SUNRPC: Fix up xprt_write_space()
>
> <snip>
>
> @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
> }
>
> switch (status) {
> + case -EAGAIN:
> + xs_nospace(task);
> + break;
> case -ENETUNREACH:
> case -EPIPE:
> case -ECONNREFUSED:
> /* When the server has died, an ICMP port unreachable message
> * prompts ECONNREFUSED. */
> - break;
> - case -EAGAIN:
> - xs_nospace(task);
> + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> break;
> default:
> + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> dprintk("RPC: sendmsg returned unrecognized error %d\n",
> -status);
> - break;
> }
>
> return status;
>
>
> This went in to 2.6.26 which was the first time we saw the bug
> (2.6.26.3) on kerneloops.
>
> I don't know if *this* is a bad commit or some other locking changed in
> 2.6.26 which tickles the bug.
It should be unrelated. If the client managed to get past the call to
xs_sendpages(), then the UDP socket must obviously exist already.
Can you show me the oops?
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sunrpc: xprt is not connected until after sock is set
[not found] ` <1235785237.20549.51.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-02-28 4:57 ` Aaron Straus
2009-02-28 5:07 ` Aaron Straus
1 sibling, 0 replies; 15+ messages in thread
From: Aaron Straus @ 2009-02-28 4:57 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Ben Myers, bfields, linux-nfs, neilb
Hi,
On Feb 27 08:40 PM, Trond Myklebust wrote:
> > This went in to 2.6.26 which was the first time we saw the bug
> > (2.6.26.3) on kerneloops.
> >
> > I don't know if *this* is a bad commit or some other locking changed in
> > 2.6.26 which tickles the bug.
>
> It should be unrelated. If the client managed to get past the call to
> xs_sendpages(), then the UDP socket must obviously exist already.
>
> Can you show me the oops?
Sure, please see the original e-mail below with the oops at the bottom.
Also if you follow this link:
http://www.kerneloops.org/search.php?search=xs_udp_send_request&btnG=Function+Search
you can see other oops (with other stack traces) on kerneloops.
Thanks for your time!!
=a=
----
From: Aaron Straus <aaron-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
Subject: BUG NULL pointer dereference in SUNRPC xs_udp_send_request
Message-ID: <20090223201108.GB3308-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
Hi,
We received the trace below on one of our machines this weekend. The
machine is running vanilla 2.6.27.14.
If I'm reading the trace correctly, it looks like this line of
xs_udp_send_request:
clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
The machine is x86 (32-bit).
Please let me know if you need anything else e.g .config or full
dmesg.
Thanks for your time.
=a=
--
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f89fbf5e>] :sunrpc:xs_udp_send_request+0xa3/0xca
*pdpt = 0000000035b6d001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: nfs lockd autofs4 sunrpc ohci_hcd usbcore tg3 libphy [last unloaded: x_tables]
Pid: 1870, comm: mount Not tainted (2.6.27.14-asb #104)
EIP: 0060:[<f89fbf5e>] EFLAGS: 00010202 CPU: 2
EIP is at xs_udp_send_request+0xa3/0xca [sunrpc]
EAX: 00000000 EBX: ffffff95 ECX: 00000010 EDX: 00000000
ESI: f58ba000 EDI: f58f9000 EBP: f78df800 ESP: c66d1a80
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mount (pid: 1870, ti=c66d0000 task=f78d4a20 task.ti=c66d0000)
Stack: f58ba004 f58ba000 f58ba054 f58ba0b4 f58f9000 f89f981a f78df800 f58f9290
f58ba074 c66d1b64 f58ba000 f78df800 f78df848 f89f83a3 f78df848 f78df800
00000000 f89fcbe2 f78df800 c66d1ae8 c66d1b04 c66d1bc0 f89f78a9 f63bf800
Call Trace:
[<f89f981a>] xprt_transmit+0xd3/0x1ad [sunrpc]
[<f89f83a3>] call_transmit+0x1b0/0x1e6 [sunrpc]
[<f89fcbe2>] __rpc_execute+0x63/0x1e1 [sunrpc]
[<f89f78a9>] rpc_run_task+0x3e/0x45 [sunrpc]
[<f89f7939>] rpc_call_sync+0x38/0x52 [sunrpc]
[<f8a02408>] rpcb_register_call+0x85/0xc7 [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89fe43e>] svc_register+0xb0/0x12a [sunrpc]
[<f89ff7b2>] svc_setup_socket+0x6b/0x242 [sunrpc]
[<f89ffbde>] svc_create_socket+0x255/0x2bd [sunrpc]
[<f89f7b5d>] rpc_shutdown_client+0xa2/0xaa [sunrpc]
[<f8a02411>] rpcb_register_call+0x8e/0xc7 [sunrpc]
[<f89fc7d4>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89ffc57>] svc_tcp_create+0x11/0x14 [sunrpc]
[<f8a07055>] svc_create_xprt+0xd8/0x18c [sunrpc]
[<f8891f45>] make_socks+0x85/0xc2 [lockd]
[<f889216f>] lockd_up+0x80/0x145 [lockd]
[<f8890384>] nlmclnt_init+0x1c/0x4e [lockd]
[<f8a1d80b>] nfs_start_lockd+0x5f/0x7d [nfs]
[<f8a1e1dc>] nfs_create_server+0x6d9/0x9bd [nfs]
[<c02a9cd7>] sock_common_recvmsg+0x2f/0x45
[<c02a8784>] sock_recvmsg+0xc8/0xe3
[<c02a8857>] sock_sendmsg+0xb8/0xd1
[<c0114407>] kmap_atomic+0x11/0x14
[<c0143fa3>] get_page_from_freelist+0x352/0x3ca
[<c01bf5b6>] idr_get_empty_slot+0x150/0x225
[<f8a25518>] nfs_get_sb+0x651/0x87d [nfs]
[<c0148e3a>] kstrdup+0x27/0x48
[<c015f9cb>] vfs_kern_mount+0x39/0x72
[<c015fa42>] do_kern_mount+0x2f/0xb4
[<c017166f>] do_new_mount+0x55/0x89
[<c017181f>] do_mount+0x17c/0x19b
[<c030e3da>] error_code+0x72/0x78
[<c016fa36>] copy_mount_options+0x78/0x10d
[<c01718ab>] sys_mount+0x6d/0xaa
[<c0102cb9>] sysenter_do_call+0x12/0x25
=======================
Code: 83 fb e0 74 13 83 fb f5 75 1b 89 e8 e8 6c e7 ff ff bb f5 ff ff ff eb 32 8b 87 f4 02 00 00 f0 80 60 08 fe eb 25 8b 87 f4 02 00 00 <f0> 80 60 08 fe 80 3d 68 b8 a1 f8 00 79 11 89 d8 f7 d8 50 68 fa
EIP: [<f89fbf5e>] xs_udp_send_request+0xa3/0xca [sunrpc] SS:ESP 0068:c66d1a80
---[ end trace a09733af92e06ec6 ]---
--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sunrpc: xprt is not connected until after sock is set
[not found] ` <1235785237.20549.51.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-28 4:57 ` Aaron Straus
@ 2009-02-28 5:07 ` Aaron Straus
[not found] ` <20090228050707.GB22330-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
1 sibling, 1 reply; 15+ messages in thread
From: Aaron Straus @ 2009-02-28 5:07 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Ben Myers, bfields, linux-nfs, neilb
On Feb 27 08:40 PM, Trond Myklebust wrote:
> > > > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> > > > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > > ===================================================================
> > > > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> > > > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> > > > sk->sk_no_check = UDP_CSUM_NORCV;
> > > > sk->sk_allocation = GFP_ATOMIC;
> > > >
> > > > - xprt_set_connected(xprt);
> > > > -
> > > > /* Reset to new socket */
> > > > transport->sock = sock;
> > > > transport->inet = sk;
> > > >
> > > > xs_set_memalloc(xprt);
> > > >
> > > > + xprt_set_connected(xprt);
> > > > write_unlock_bh(&sk->sk_callback_lock);
> > > > }
> > > > xs_udp_do_set_buffer_size(xprt);
> > >
> >
> > @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
> > }
> >
> > switch (status) {
> > + case -EAGAIN:
> > + xs_nospace(task);
> > + break;
> > case -ENETUNREACH:
> > case -EPIPE:
> > case -ECONNREFUSED:
> > /* When the server has died, an ICMP port unreachable message
> > * prompts ECONNREFUSED. */
> > - break;
> > - case -EAGAIN:
> > - xs_nospace(task);
> > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> > break;
> > default:
> > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> > dprintk("RPC: sendmsg returned unrecognized error %d\n",
> > -status);
> > - break;
> > }
> >
> > return status;
> >
> > This went in to 2.6.26 which was the first time we saw the bug
> > (2.6.26.3) on kerneloops.
> >
> > I don't know if *this* is a bad commit or some other locking changed in
> > 2.6.26 which tickles the bug.
>
> It should be unrelated. If the client managed to get past the call to
> xs_sendpages(), then the UDP socket must obviously exist already.
PS just to repeat the previous discussion.
It seems like xs_sendpages does check if sock is NULL and returns
-ENOTCONN in that case.
The problem is that now in xs_udp_send_request falls into the default:
section of that switch statement and tries to do the:
> > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
but sock is NULL, so I think that's the oops.
Anyway, that's why Ben thinks we need to move xprt_set_connected() down
(past the setting of transport->sock) in xs_udp_finish_connecting().
I was worried without a barrier xprt_set_connected() could just move
back up before the setting of transport->sock again either by compiler
or CPU reordering.
Anyway that's where we are....
thanks!
=a=
--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sunrpc: xprt is not connected until after sock is set
[not found] ` <20090228050707.GB22330-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
@ 2009-02-28 18:09 ` Trond Myklebust
[not found] ` <1235844568.7677.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2009-02-28 18:09 UTC (permalink / raw)
To: Aaron Straus; +Cc: Ben Myers, bfields, linux-nfs, neilb
On Fri, 2009-02-27 at 21:07 -0800, Aaron Straus wrote:
> On Feb 27 08:40 PM, Trond Myklebust wrote:
> > > > > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> > > > > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > > > ===================================================================
> > > > > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> > > > > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > > > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> > > > > sk->sk_no_check = UDP_CSUM_NORCV;
> > > > > sk->sk_allocation = GFP_ATOMIC;
> > > > >
> > > > > - xprt_set_connected(xprt);
> > > > > -
> > > > > /* Reset to new socket */
> > > > > transport->sock = sock;
> > > > > transport->inet = sk;
> > > > >
> > > > > xs_set_memalloc(xprt);
> > > > >
> > > > > + xprt_set_connected(xprt);
> > > > > write_unlock_bh(&sk->sk_callback_lock);
> > > > > }
> > > > > xs_udp_do_set_buffer_size(xprt);
> > > >
> > >
> > > @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
> > > }
> > >
> > > switch (status) {
> > > + case -EAGAIN:
> > > + xs_nospace(task);
> > > + break;
> > > case -ENETUNREACH:
> > > case -EPIPE:
> > > case -ECONNREFUSED:
> > > /* When the server has died, an ICMP port unreachable message
> > > * prompts ECONNREFUSED. */
> > > - break;
> > > - case -EAGAIN:
> > > - xs_nospace(task);
> > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> > > break;
> > > default:
> > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> > > dprintk("RPC: sendmsg returned unrecognized error %d\n",
> > > -status);
> > > - break;
> > > }
> > >
> > > return status;
> > >
> > > This went in to 2.6.26 which was the first time we saw the bug
> > > (2.6.26.3) on kerneloops.
> > >
> > > I don't know if *this* is a bad commit or some other locking changed in
> > > 2.6.26 which tickles the bug.
> >
> > It should be unrelated. If the client managed to get past the call to
> > xs_sendpages(), then the UDP socket must obviously exist already.
>
> PS just to repeat the previous discussion.
>
> It seems like xs_sendpages does check if sock is NULL and returns
> -ENOTCONN in that case.
>
> The problem is that now in xs_udp_send_request falls into the default:
> section of that switch statement and tries to do the:
>
> > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
>
> but sock is NULL, so I think that's the oops.
OK, that makes a lot of sense. We should definitely fix up both
xs_tcp_send_request() and xs_udp_send_request() to deal correctly with
that error.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sunrpc: xprt is not connected until after sock is set
[not found] ` <1235844568.7677.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-03-02 16:36 ` Ben Myers
2009-03-02 16:39 ` Chuck Lever
0 siblings, 1 reply; 15+ messages in thread
From: Ben Myers @ 2009-03-02 16:36 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Aaron Straus, bfields, linux-nfs, neilb
Hi Trond,
On Sat, Feb 28, 2009 at 01:09:28PM -0500, Trond Myklebust wrote:
> On Fri, 2009-02-27 at 21:07 -0800, Aaron Straus wrote:
> > It seems like xs_sendpages does check if sock is NULL and returns
> > -ENOTCONN in that case.
Here's a trace that shows the above:
Starting kernel based NFS server: idmapd mountd statd nfsdrpc.nfsd[4502]: NaT co
nsumption 17179869216 [1]
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ib_ipoib ib_cm
ib_sa ipv6 ib_uverbs ib_umad iw_cxgb3 cxgb3 mlx4_en mlx4_ib mlx4_core binfmt_mi
sc fuse dm_crypt crypto_blkcipher nls_iso8859_1 loop dm_mod sr_mod cdrom ib_mthc
a tg3 ib_mad shpchp sg ib_core button libphy pci_hotplug qla2xxx mptctl usbhid h
id ff_memless ohci_hcd ehci_hcd usbcore sd_mod crc_t10dif ide_generic mptfc scsi
_transport_fc scsi_tgt mptspi scsi_transport_spi qla1280 sata_vsc sgiioc4 ioc4 x
fs fan ide_pci_generic siimage ide_core mptsas mptscsih mptbase scsi_transport_s
as thermal processor thermal_sys hwmon pata_sil680 libata scsi_mod dock
Supported: Yes
Pid: 4502, CPU 2, comm: rpc.nfsd
psr : 0000101008526030 ifs : 800000000000040d ip : [<a0000002044acf30>] Not
tainted (2.6.27.15-2-default)
ip is at xs_udp_send_request+0x270/0x340 [sunrpc]
unat: 0000000000000000 pfs : 000000000000040d rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : aa99aaa6aa5aa999
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a0000002044acd30 b6 : a0000002044accc0 b7 : a0000002044c8fe0
f6 : 1003e000000000000222e f7 : 1003e00000000000004e2
f8 : 1003e00000000000009c4 f9 : 1003e0000001f3eb1b6ce
f10 : 1003e93b691609417c0ec f11 : 1003e0000000000000013
r1 : a0000002044d9760 r2 : a000000204511ba0 r3 : a000000204511ba0
r8 : ffffffffffffff95 r9 : e0000060431825a8 r10 : a000000204511ce0
r11 : 0000000000000000 r12 : e0000060434cfc30 r13 : e0000060434c0000
r14 : 0000000000000000 r15 : e00000607bd79198 r16 : a0000002044e9fa8
r17 : e000006078fc6040 r18 : 0000000000000054 r19 : e000006043182668
r20 : ffffffffffffff95 r21 : 00000001000023ca r22 : e000006078fc6148
r23 : a000000100efa780 r24 : a000000204511ce0 r25 : 0000000000000000
r26 : e000006078fc613c r27 : e00000607bd795a8 r28 : 0000000000000000
r29 : e00000607bd796e0 r30 : 0000000000000000 r31 : e00000607b3dc000
Call Trace:
[<a000000100016a30>] show_stack+0x50/0xa0
sp=e0000060434cf790 bsp=e0000060434c1720
[<a0000001000172a0>] show_regs+0x820/0x860
sp=e0000060434cf960 bsp=e0000060434c16c8
[<a0000001000266a0>] die+0x1a0/0x2e0
sp=e0000060434cf960 bsp=e0000060434c1688
[<a000000100026830>] die_if_kernel+0x50/0x80
sp=e0000060434cf960 bsp=e0000060434c1658
[<a0000001006e79a0>] ia64_fault+0xac0/0xb60
sp=e0000060434cf960 bsp=e0000060434c1610
[<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
sp=e0000060434cfa60 bsp=e0000060434c1610
[<a0000002044acf30>] xs_udp_send_request+0x270/0x340 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c15a0
[<a0000002044a9a20>] xprt_transmit+0x3a0/0x620 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1560
[<a0000002044a28a0>] call_transmit+0x600/0x720 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1520
[<a0000002044b5690>] __rpc_execute+0x1d0/0x7a0 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c14a0
[<a0000002044b5ce0>] rpc_execute+0x80/0xa0 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1480
[<a0000002044a43d0>] rpc_run_task+0xf0/0x120 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1460
[<a0000002044a46e0>] rpc_call_sync+0xe0/0x160 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1430
[<a0000002044ca500>] rpcb_register_call+0x120/0x220 [sunrpc]
sp=e0000060434cfc70 bsp=e0000060434c13e8
[<a0000002044ca7b0>] rpcb_register+0x1b0/0x1e0 [sunrpc]
sp=e0000060434cfcc0 bsp=e0000060434c1398
[<a0000002044bd200>] svc_register+0x1e0/0x360 [sunrpc]
sp=e0000060434cfd20 bsp=e0000060434c1310
[<a0000002044c0c50>] svc_setup_socket+0x150/0x9a0 [sunrpc]
sp=e0000060434cfd30 bsp=e0000060434c12c0
[<a0000002044c3e10>] svc_addsock+0x1d0/0x480 [sunrpc]
sp=e0000060434cfd40 bsp=e0000060434c1270
[<a000000204b52550>] write_ports+0x1f0/0x6c0 [nfsd]
sp=e0000060434cfdd0 bsp=e0000060434c1228
[<a000000204b54270>] nfsctl_transaction_write+0xf0/0x1c0 [nfsd]
sp=e0000060434cfe20 bsp=e0000060434c11e8
[<a0000001001b6a90>] vfs_write+0x1b0/0x360
sp=e0000060434cfe20 bsp=e0000060434c1198
[<a0000001001b6de0>] sys_write+0x80/0x100
sp=e0000060434cfe20 bsp=e0000060434c1120
[<a00000010000c600>] ia64_ret_from_syscall+0x0/0x20
sp=e0000060434cfe30 bsp=e0000060434c1120
[<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
sp=e0000060434d0000 bsp=e0000060434c1120
If I read it correctly the xs_sendpages return value is in r8, -ENOTCONN. We
had xs_udp_finish_connecting in rpciod racing with the above write.
> > The problem is that now in xs_udp_send_request falls into the default:
> > section of that switch statement and tries to do the:
> >
> > > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> >
> > but sock is NULL, so I think that's the oops.
>
> OK, that makes a lot of sense. We should definitely fix up both
> xs_tcp_send_request() and xs_udp_send_request() to deal correctly with
> that error.
With the patch as it stands I think we will not get into
xs_udp_send_request because xprt_prepare_transmit will have returned
-ENOTCONN and the rpc retried in __rpc_execute:
rpc_execute
__rpc_execute
call_transmit
| xprt_prepare_transmit
| if (!xprt_connected) return -ENOTCONN *here
| xprt_transmit
| xs_udp_send_request
It looks like we just need to protect sock_xprt->sock with respect to
the value of rpc_xprt->state, and Aaron suggested a barrier to do that.
Alternatively we could add a lock to rpc_xprt.
What is your recommendation?
Thanks!
Ben
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] sunrpc: xprt is not connected until after sock is set
2009-03-02 16:36 ` Ben Myers
@ 2009-03-02 16:39 ` Chuck Lever
0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2009-03-02 16:39 UTC (permalink / raw)
To: Trond Myklebust
Cc: Aaron Straus, Ben Myers, J. Bruce Fields, Linux NFS Mailing List,
Neil Brown
On Mar 2, 2009, at Mar 2, 2009, 11:36 AM, Ben Myers wrote:
> Hi Trond,
>
> On Sat, Feb 28, 2009 at 01:09:28PM -0500, Trond Myklebust wrote:
>> On Fri, 2009-02-27 at 21:07 -0800, Aaron Straus wrote:
>>> It seems like xs_sendpages does check if sock is NULL and returns
>>> -ENOTCONN in that case.
>
> Here's a trace that shows the above:
>
> Starting kernel based NFS server: idmapd mountd statd
> nfsdrpc.nfsd[4502]: NaT co
> nsumption 17179869216 [1]
> Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
> ib_ipoib ib_cm
> ib_sa ipv6 ib_uverbs ib_umad iw_cxgb3 cxgb3 mlx4_en mlx4_ib
> mlx4_core binfmt_mi
> sc fuse dm_crypt crypto_blkcipher nls_iso8859_1 loop dm_mod sr_mod
> cdrom ib_mthc
> a tg3 ib_mad shpchp sg ib_core button libphy pci_hotplug qla2xxx
> mptctl usbhid h
> id ff_memless ohci_hcd ehci_hcd usbcore sd_mod crc_t10dif
> ide_generic mptfc scsi
> _transport_fc scsi_tgt mptspi scsi_transport_spi qla1280 sata_vsc
> sgiioc4 ioc4 x
> fs fan ide_pci_generic siimage ide_core mptsas mptscsih mptbase
> scsi_transport_s
> as thermal processor thermal_sys hwmon pata_sil680 libata scsi_mod
> dock
> Supported: Yes
>
> Pid: 4502, CPU 2, comm: rpc.nfsd
> psr : 0000101008526030 ifs : 800000000000040d ip :
> [<a0000002044acf30>] Not
> tainted (2.6.27.15-2-default)
> ip is at xs_udp_send_request+0x270/0x340 [sunrpc]
> unat: 0000000000000000 pfs : 000000000000040d rsc : 0000000000000003
> rnat: 0000000000000000 bsps: 0000000000000000 pr : aa99aaa6aa5aa999
> ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
> csd : 0000000000000000 ssd : 0000000000000000
> b0 : a0000002044acd30 b6 : a0000002044accc0 b7 : a0000002044c8fe0
> f6 : 1003e000000000000222e f7 : 1003e00000000000004e2
> f8 : 1003e00000000000009c4 f9 : 1003e0000001f3eb1b6ce
> f10 : 1003e93b691609417c0ec f11 : 1003e0000000000000013
> r1 : a0000002044d9760 r2 : a000000204511ba0 r3 : a000000204511ba0
> r8 : ffffffffffffff95 r9 : e0000060431825a8 r10 : a000000204511ce0
> r11 : 0000000000000000 r12 : e0000060434cfc30 r13 : e0000060434c0000
> r14 : 0000000000000000 r15 : e00000607bd79198 r16 : a0000002044e9fa8
> r17 : e000006078fc6040 r18 : 0000000000000054 r19 : e000006043182668
> r20 : ffffffffffffff95 r21 : 00000001000023ca r22 : e000006078fc6148
> r23 : a000000100efa780 r24 : a000000204511ce0 r25 : 0000000000000000
> r26 : e000006078fc613c r27 : e00000607bd795a8 r28 : 0000000000000000
> r29 : e00000607bd796e0 r30 : 0000000000000000 r31 : e00000607b3dc000
>
> Call Trace:
> [<a000000100016a30>] show_stack+0x50/0xa0
> sp=e0000060434cf790
> bsp=e0000060434c1720
> [<a0000001000172a0>] show_regs+0x820/0x860
> sp=e0000060434cf960
> bsp=e0000060434c16c8
> [<a0000001000266a0>] die+0x1a0/0x2e0
> sp=e0000060434cf960
> bsp=e0000060434c1688
> [<a000000100026830>] die_if_kernel+0x50/0x80
> sp=e0000060434cf960
> bsp=e0000060434c1658
> [<a0000001006e79a0>] ia64_fault+0xac0/0xb60
> sp=e0000060434cf960
> bsp=e0000060434c1610
> [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
> sp=e0000060434cfa60
> bsp=e0000060434c1610
> [<a0000002044acf30>] xs_udp_send_request+0x270/0x340 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c15a0
> [<a0000002044a9a20>] xprt_transmit+0x3a0/0x620 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1560
> [<a0000002044a28a0>] call_transmit+0x600/0x720 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1520
> [<a0000002044b5690>] __rpc_execute+0x1d0/0x7a0 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c14a0
> [<a0000002044b5ce0>] rpc_execute+0x80/0xa0 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1480
> [<a0000002044a43d0>] rpc_run_task+0xf0/0x120 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1460
> [<a0000002044a46e0>] rpc_call_sync+0xe0/0x160 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1430
> [<a0000002044ca500>] rpcb_register_call+0x120/0x220 [sunrpc]
> sp=e0000060434cfc70
> bsp=e0000060434c13e8
> [<a0000002044ca7b0>] rpcb_register+0x1b0/0x1e0 [sunrpc]
> sp=e0000060434cfcc0
> bsp=e0000060434c1398
> [<a0000002044bd200>] svc_register+0x1e0/0x360 [sunrpc]
> sp=e0000060434cfd20
> bsp=e0000060434c1310
> [<a0000002044c0c50>] svc_setup_socket+0x150/0x9a0 [sunrpc]
> sp=e0000060434cfd30
> bsp=e0000060434c12c0
> [<a0000002044c3e10>] svc_addsock+0x1d0/0x480 [sunrpc]
> sp=e0000060434cfd40
> bsp=e0000060434c1270
> [<a000000204b52550>] write_ports+0x1f0/0x6c0 [nfsd]
> sp=e0000060434cfdd0
> bsp=e0000060434c1228
> [<a000000204b54270>] nfsctl_transaction_write+0xf0/0x1c0 [nfsd]
> sp=e0000060434cfe20
> bsp=e0000060434c11e8
> [<a0000001001b6a90>] vfs_write+0x1b0/0x360
> sp=e0000060434cfe20
> bsp=e0000060434c1198
> [<a0000001001b6de0>] sys_write+0x80/0x100
> sp=e0000060434cfe20
> bsp=e0000060434c1120
> [<a00000010000c600>] ia64_ret_from_syscall+0x0/0x20
> sp=e0000060434cfe30
> bsp=e0000060434c1120
> [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
> sp=e0000060434d0000
> bsp=e0000060434c1120
>
> If I read it correctly the xs_sendpages return value is in r8, -
> ENOTCONN. We
> had xs_udp_finish_connecting in rpciod racing with the above write.
I had a patch that addressed a similar race in the UDP connect logic.
Trond, did you get that one when I sent it before Connectathon?
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-03-02 16:39 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-23 20:11 BUG NULL pointer dereference in SUNRPC xs_udp_send_request Aaron Straus
2009-02-23 20:11 ` Aaron Straus
[not found] ` <20090223201108.GB3308-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
2009-02-25 2:39 ` Ben Myers
2009-02-25 2:39 ` Ben Myers
2009-02-26 0:17 ` Aaron Straus
2009-02-26 0:17 ` Aaron Straus
[not found] ` <20090226001744.GB7613-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
2009-02-27 23:54 ` [PATCH] sunrpc: xprt is not connected until after sock is set Ben Myers
2009-02-28 0:37 ` Trond Myklebust
[not found] ` <1235781463.20549.33.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-28 1:34 ` Aaron Straus
[not found] ` <20090228013457.GF7706-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
2009-02-28 1:40 ` Trond Myklebust
[not found] ` <1235785237.20549.51.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-28 4:57 ` Aaron Straus
2009-02-28 5:07 ` Aaron Straus
[not found] ` <20090228050707.GB22330-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org>
2009-02-28 18:09 ` Trond Myklebust
[not found] ` <1235844568.7677.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-03-02 16:36 ` Ben Myers
2009-03-02 16:39 ` Chuck Lever
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.