* BUG at net/sunrpc/svc_xprt.c:921
@ 2013-02-07 12:56 Tom Horsley
2013-02-08 20:58 ` J. Bruce Fields
0 siblings, 1 reply; 22+ messages in thread
From: Tom Horsley @ 2013-02-07 12:56 UTC (permalink / raw)
To: linux-kernel
I noticed some previous messages with this subject, but the
walkback I'm getting doesn't match exactly the ones shown
in the threads I saw, so I figured I'd send this in.
This happens on both my Fedora 18 and Fedora 17 partitions
when mounting filesystems from very old servers that
need the proto=udp option to talk.
The redhat bugzilla is here:
https://bugzilla.redhat.com/show_bug.cgi?id=908451
There is a photo of the walkback in that bugzilla:
svc_delete_xprt+0x12a
svc_recv+0x101
? nfs_callback_authenticate+0x50
nfs4_callback_svc+0x3b
kthread+0xc0
? ftrace_raw_event_xen_mmu_flush_tlb_others+0x50
? kthread_create_on_node+0x120
ret_from_fork+0x7c
? kthread_create_on_node+0x120
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-02-07 12:56 BUG at net/sunrpc/svc_xprt.c:921 Tom Horsley
@ 2013-02-08 20:58 ` J. Bruce Fields
0 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2013-02-08 20:58 UTC (permalink / raw)
To: Tom Horsley; +Cc: linux-kernel
On Thu, Feb 07, 2013 at 07:56:51AM -0500, Tom Horsley wrote:
> I noticed some previous messages with this subject, but the
> walkback I'm getting doesn't match exactly the ones shown
> in the threads I saw, so I figured I'd send this in.
>
> This happens on both my Fedora 18 and Fedora 17 partitions
> when mounting filesystems from very old servers that
> need the proto=udp option to talk.
>
> The redhat bugzilla is here:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=908451
>
> There is a photo of the walkback in that bugzilla:
>
> svc_delete_xprt+0x12a
> svc_recv+0x101
> ? nfs_callback_authenticate+0x50
> nfs4_callback_svc+0x3b
> kthread+0xc0
> ? ftrace_raw_event_xen_mmu_flush_tlb_others+0x50
> ? kthread_create_on_node+0x120
> ret_from_fork+0x7c
> ? kthread_create_on_node+0x120
OK, yes this is the
if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
BUG();
in svc_delete_xprt() and is a known (but unfixed) problem. Stanislav
has some patches, that should be fixed soon....
--b.
^ permalink raw reply [flat|nested] 22+ messages in thread
* BUG at net/sunrpc/svc_xprt.c:921
@ 2013-01-14 16:17 Mark Lord
2013-01-14 20:37 ` J. Bruce Fields
2013-01-17 13:11 ` Mark Lord
0 siblings, 2 replies; 22+ messages in thread
From: Mark Lord @ 2013-01-14 16:17 UTC (permalink / raw)
To: J. Bruce Fields, linux-nfs, Linux Kernel
[-- Attachment #1: Type: text/plain, Size: 3221 bytes --]
Since upgrading to 3.7, and now 3.7.2, my AMD-450E based server
is getting these BUG complaints. The .config file is gzip'd/attached.
------------[ cut here ]------------
kernel BUG at net/sunrpc/svc_xprt.c:921!
invalid opcode: 0000 [#1] SMP
Modules linked in: nfsv4 xt_state xt_tcpudp xt_recent xt_LOG xt_limit iptable_mangle iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables
sc520_wdt btusb snd_usb_audio snd_usbmidi_lib hid_generic ftdi_sio usbserial usbhid hid
snd_hda_codec_realtek psmouse snd_hda_codec_hdmi r8169 xhci_hcd mii snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq bnep
snd_timer rfcomm snd_seq_device bluetooth snd nfsd auth_rpcgss binfmt_misc radeon nfs lockd sunrpc
soundcore ttm snd_page_alloc drm_kms_helper drm i2c_algo_bit it87 hwmon_vid k10temp hwmon microcode
CPU 0
Pid: 29613, comm: nfsv4.0-svc Not tainted 3.7.2 #1 System manufacturer System Product Name/E45M1-I
DELUXE
RIP: 0010:[<ffffffffa01696cd>] [<ffffffffa01696cd>] svc_delete_xprt+0x23/0xeb [sunrpc]
RSP: 0018:ffff880234f05e38 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff8801b931b000 RCX: dead000000200200
RDX: dead000000100100 RSI: ffff8801b931b038 RDI: 0000000000000006
RBP: ffff880049125e40 R08: 0000000000000606 R09: ffff88023ec10fc0
R10: ffff88023ec10fc0 R11: ffff88023ec10fc0 R12: ffff880049125e40
R13: ffff8801b931b038 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f5bef2fd700(0000) GS:ffff88023ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f2f1800bfa0 CR3: 000000015ba2e000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsv4.0-svc (pid: 29613, threadinfo ffff880234f04000, task ffff88021b51a280)
Stack:
0000000000001cc7 ffff8801a5f2e000 ffff8801b931b000 ffff880049125e40
ffff880049125e40 ffffffffa016a56a 0000000000010fc0 ffff880234f05fd8
ffff880234f05fd8 ffff8801a5f2e000 ffff8801a5f2e000 ffff880234f05f08
Call Trace:
[<ffffffffa016a56a>] ? svc_recv+0xcc/0x338 [sunrpc]
[<ffffffffa0318bfc>] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
[<ffffffffa0318c19>] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
[<ffffffff810407e6>] ? kthread+0x81/0x89
[<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
[<ffffffff812ea62c>] ? ret_from_fork+0x7c/0xb0
[<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
Code: c2 84 d2 74 02 eb a0 c3 41 55 4c 8d 6f 38 41 54 4c 89 ee 55 53 48 89 fb 50 48 8b 6f 40 bf 06
00 00 00 e8 77 fa ff ff 85 c0 74 02 <0f> 0b 48 8b 43 08 4c 8d 65 10 48 89 df ff 50 38 4c 89 e7 e8 6d
RIP [<ffffffffa01696cd>] svc_delete_xprt+0x23/0xeb [sunrpc]
RSP <ffff880234f05e38>
---[ end trace 916f6471c0b47e1d ]---
Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:
/*
* Remove a dead transport
*/
static void svc_delete_xprt(struct svc_xprt *xprt)
{
struct svc_serv *serv = xprt->xpt_server;
struct svc_deferred_req *dr;
/* Only do this once */
if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
BUG();
...
[-- Attachment #2: config.txt.gz --]
[-- Type: application/x-gzip, Size: 17634 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-14 16:17 Mark Lord
@ 2013-01-14 20:37 ` J. Bruce Fields
2013-01-15 4:16 ` Mark Lord
2013-01-17 13:11 ` Mark Lord
1 sibling, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2013-01-14 20:37 UTC (permalink / raw)
To: Mark Lord; +Cc: linux-nfs, Linux Kernel
Thanks for the report.
On Mon, Jan 14, 2013 at 11:17:09AM -0500, Mark Lord wrote:
> Since upgrading to 3.7, and now 3.7.2, my AMD-450E based server
It's acting as an NFS client, right?
What did you upgrade from?
> is getting these BUG complaints. The .config file is gzip'd/attached.
Is this easy to reproduce?
--b.
>
> ------------[ cut here ]------------
> kernel BUG at net/sunrpc/svc_xprt.c:921!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: nfsv4 xt_state xt_tcpudp xt_recent xt_LOG xt_limit iptable_mangle iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables
> sc520_wdt btusb snd_usb_audio snd_usbmidi_lib hid_generic ftdi_sio usbserial usbhid hid
> snd_hda_codec_realtek psmouse snd_hda_codec_hdmi r8169 xhci_hcd mii snd_hda_intel snd_hda_codec
> snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq bnep
> snd_timer rfcomm snd_seq_device bluetooth snd nfsd auth_rpcgss binfmt_misc radeon nfs lockd sunrpc
> soundcore ttm snd_page_alloc drm_kms_helper drm i2c_algo_bit it87 hwmon_vid k10temp hwmon microcode
> CPU 0
> Pid: 29613, comm: nfsv4.0-svc Not tainted 3.7.2 #1 System manufacturer System Product Name/E45M1-I
> DELUXE
> RIP: 0010:[<ffffffffa01696cd>] [<ffffffffa01696cd>] svc_delete_xprt+0x23/0xeb [sunrpc]
> RSP: 0018:ffff880234f05e38 EFLAGS: 00010286
> RAX: 00000000ffffffff RBX: ffff8801b931b000 RCX: dead000000200200
> RDX: dead000000100100 RSI: ffff8801b931b038 RDI: 0000000000000006
> RBP: ffff880049125e40 R08: 0000000000000606 R09: ffff88023ec10fc0
> R10: ffff88023ec10fc0 R11: ffff88023ec10fc0 R12: ffff880049125e40
> R13: ffff8801b931b038 R14: 0000000000000000 R15: 0000000000000000
> FS: 00007f5bef2fd700(0000) GS:ffff88023ec00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f2f1800bfa0 CR3: 000000015ba2e000 CR4: 00000000000007f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process nfsv4.0-svc (pid: 29613, threadinfo ffff880234f04000, task ffff88021b51a280)
> Stack:
> 0000000000001cc7 ffff8801a5f2e000 ffff8801b931b000 ffff880049125e40
> ffff880049125e40 ffffffffa016a56a 0000000000010fc0 ffff880234f05fd8
> ffff880234f05fd8 ffff8801a5f2e000 ffff8801a5f2e000 ffff880234f05f08
> Call Trace:
> [<ffffffffa016a56a>] ? svc_recv+0xcc/0x338 [sunrpc]
> [<ffffffffa0318bfc>] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
> [<ffffffffa0318c19>] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
> [<ffffffff810407e6>] ? kthread+0x81/0x89
> [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
> [<ffffffff812ea62c>] ? ret_from_fork+0x7c/0xb0
> [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
> Code: c2 84 d2 74 02 eb a0 c3 41 55 4c 8d 6f 38 41 54 4c 89 ee 55 53 48 89 fb 50 48 8b 6f 40 bf 06
> 00 00 00 e8 77 fa ff ff 85 c0 74 02 <0f> 0b 48 8b 43 08 4c 8d 65 10 48 89 df ff 50 38 4c 89 e7 e8 6d
> RIP [<ffffffffa01696cd>] svc_delete_xprt+0x23/0xeb [sunrpc]
> RSP <ffff880234f05e38>
> ---[ end trace 916f6471c0b47e1d ]---
>
>
> Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:
>
> /*
> * Remove a dead transport
> */
> static void svc_delete_xprt(struct svc_xprt *xprt)
> {
> struct svc_serv *serv = xprt->xpt_server;
> struct svc_deferred_req *dr;
>
> /* Only do this once */
> if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
> BUG();
> ...
>
>
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-14 20:37 ` J. Bruce Fields
@ 2013-01-15 4:16 ` Mark Lord
2013-01-15 20:56 ` J. Bruce Fields
0 siblings, 1 reply; 22+ messages in thread
From: Mark Lord @ 2013-01-15 4:16 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs, Linux Kernel
On 13-01-14 03:37 PM, J. Bruce Fields wrote:
> Thanks for the report.
>
> On Mon, Jan 14, 2013 at 11:17:09AM -0500, Mark Lord wrote:
>> Since upgrading to 3.7, and now 3.7.2, my AMD-450E based server
>
> It's acting as an NFS client, right?
Client and server, with other Linux boxes all running 3.something kernels.
> What did you upgrade from?
3.4.something, I believe.
>> is getting these BUG complaints. The .config file is gzip'd/attached.
>
> Is this easy to reproduce?
So far, it seems to pop up within a day or so of any reboot.
I normally only reboot that system for a kernel upgrade,
but can do so a bit more often if there's useful info to collect.
Cheers
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-15 4:16 ` Mark Lord
@ 2013-01-15 20:56 ` J. Bruce Fields
2013-01-16 5:20 ` Stanislav Kinsbursky
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2013-01-15 20:56 UTC (permalink / raw)
To: Mark Lord; +Cc: linux-nfs, Linux Kernel, Stanislav Kinsbursky
On Mon, Jan 14, 2013 at 11:16:00PM -0500, Mark Lord wrote:
> On 13-01-14 03:37 PM, J. Bruce Fields wrote:
> > Thanks for the report.
> >
> > On Mon, Jan 14, 2013 at 11:17:09AM -0500, Mark Lord wrote:
> >> Since upgrading to 3.7, and now 3.7.2, my AMD-450E based server
> >
> > It's acting as an NFS client, right?
>
> Client and server, with other Linux boxes all running 3.something kernels.
>
> > What did you upgrade from?
>
> 3.4.something, I believe.
>
> >> is getting these BUG complaints. The .config file is gzip'd/attached.
> >
> > Is this easy to reproduce?
>
> So far, it seems to pop up within a day or so of any reboot.
> I normally only reboot that system for a kernel upgrade,
> but can do so a bit more often if there's useful info to collect.
So this means svc_delete_xprt was called on an xprt twice.
That could happen if server threads are still running (and calling
svc_recv) after we start shutting down the server: svc_shutdown_net
assumes that server threads are already shut down, but that isn't true
any more after the containerization work.
I thought that would only be a bug for users actually running multiple
containers, but looking at nfs_callback_down, I don't think that's
true--it seems to always shut down the thread last.
--b.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-15 20:56 ` J. Bruce Fields
@ 2013-01-16 5:20 ` Stanislav Kinsbursky
2013-01-16 22:51 ` Mark Lord
0 siblings, 1 reply; 22+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-16 5:20 UTC (permalink / raw)
To: J. Bruce Fields, Mark Lord; +Cc: linux-nfs, Linux Kernel
16.01.2013 00:56, J. Bruce Fields пишет:
> On Mon, Jan 14, 2013 at 11:16:00PM -0500, Mark Lord wrote:
>> On 13-01-14 03:37 PM, J. Bruce Fields wrote:
>>> Thanks for the report.
>>>
>>> On Mon, Jan 14, 2013 at 11:17:09AM -0500, Mark Lord wrote:
>>>> Since upgrading to 3.7, and now 3.7.2, my AMD-450E based server
>>>
>>> It's acting as an NFS client, right?
>>
>> Client and server, with other Linux boxes all running 3.something kernels.
>>
>>> What did you upgrade from?
>>
>> 3.4.something, I believe.
>>
>>>> is getting these BUG complaints. The .config file is gzip'd/attached.
>>>
>>> Is this easy to reproduce?
>>
>> So far, it seems to pop up within a day or so of any reboot.
>> I normally only reboot that system for a kernel upgrade,
>> but can do so a bit more often if there's useful info to collect.
>
> So this means svc_delete_xprt was called on an xprt twice.
>
> That could happen if server threads are still running (and calling
> svc_recv) after we start shutting down the server: svc_shutdown_net
> assumes that server threads are already shut down, but that isn't true
> any more after the containerization work.
>
> I thought that would only be a bug for users actually running multiple
> containers, but looking at nfs_callback_down, I don't think that's
> true--it seems to always shut down the thread last.
>
Thanks, Bruce. It reminds me the patch with additional protection for permanents sockets shutdown I've sent you a couple of mount ago...
Look like I should revisit this patch at least.
Mark, could you provide any call traces?
> --b.
>
--
Best regards,
Stanislav Kinsbursky
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-16 5:20 ` Stanislav Kinsbursky
@ 2013-01-16 22:51 ` Mark Lord
2013-01-16 22:58 ` Mark Lord
2013-01-17 5:05 ` Stanislav Kinsbursky
0 siblings, 2 replies; 22+ messages in thread
From: Mark Lord @ 2013-01-16 22:51 UTC (permalink / raw)
To: Stanislav Kinsbursky; +Cc: J. Bruce Fields, linux-nfs, Linux Kernel
On 13-01-16 12:20 AM, Stanislav Kinsbursky wrote:
>
> Mark, could you provide any call traces?
Call traces from where/what?
There's this one, posted earlier in the BUG report:
kernel BUG at net/sunrpc/svc_xprt.c:921!
Call Trace:
[<ffffffffa016a56a>] ? svc_recv+0xcc/0x338 [sunrpc]
[<ffffffffa0318bfc>] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
[<ffffffffa0318c19>] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
[<ffffffff810407e6>] ? kthread+0x81/0x89
[<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
[<ffffffff812ea62c>] ? ret_from_fork+0x7c/0xb0
[<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-16 22:51 ` Mark Lord
@ 2013-01-16 22:58 ` Mark Lord
2013-01-17 5:05 ` Stanislav Kinsbursky
1 sibling, 0 replies; 22+ messages in thread
From: Mark Lord @ 2013-01-16 22:58 UTC (permalink / raw)
To: Stanislav Kinsbursky; +Cc: J. Bruce Fields, linux-nfs, Linux Kernel
On 13-01-16 05:51 PM, Mark Lord wrote:
> On 13-01-16 12:20 AM, Stanislav Kinsbursky wrote:
>>
>> Mark, could you provide any call traces?
>
> Call traces from where/what?
> There's this one, posted earlier in the BUG report:
>
> kernel BUG at net/sunrpc/svc_xprt.c:921!
> Call Trace:
> [<ffffffffa016a56a>] ? svc_recv+0xcc/0x338 [sunrpc]
> [<ffffffffa0318bfc>] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
> [<ffffffffa0318c19>] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
> [<ffffffff810407e6>] ? kthread+0x81/0x89
> [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
> [<ffffffff812ea62c>] ? ret_from_fork+0x7c/0xb0
> [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
..
This might be of some interest.
Here are the first few lines of the same BUG occurance,
with timestamps and the dmesg lines that immediately preceeded it.
Perhaps they might help indicate who's triggering the action
that results in the BUG(?).
Jan 14 10:58:05 zippy kernel: [66045.627952] NFS: Registering the id_resolver key type
Jan 14 10:58:05 zippy kernel: [66045.628014] Key type id_resolver registered
Jan 14 10:58:05 zippy kernel: [66045.628020] Key type id_legacy registered
Jan 14 10:58:05 zippy kernel: [66045.636302] ------------[ cut here ]------------
Jan 14 10:58:05 zippy kernel: [66045.648342] kernel BUG at net/sunrpc/svc_xprt.c:921!
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-16 22:51 ` Mark Lord
2013-01-16 22:58 ` Mark Lord
@ 2013-01-17 5:05 ` Stanislav Kinsbursky
2013-01-17 13:03 ` J. Bruce Fields
1 sibling, 1 reply; 22+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-17 5:05 UTC (permalink / raw)
To: Mark Lord; +Cc: J. Bruce Fields, linux-nfs, Linux Kernel
17.01.2013 02:51, Mark Lord пишет:
> On 13-01-16 12:20 AM, Stanislav Kinsbursky wrote:
>>
>> Mark, could you provide any call traces?
>
> Call traces from where/what?
> There's this one, posted earlier in the BUG report:
>
> kernel BUG at net/sunrpc/svc_xprt.c:921!
> Call Trace:
> [<ffffffffa016a56a>] ? svc_recv+0xcc/0x338 [sunrpc]
> [<ffffffffa0318bfc>] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
> [<ffffffffa0318c19>] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
> [<ffffffff810407e6>] ? kthread+0x81/0x89
> [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
> [<ffffffff812ea62c>] ? ret_from_fork+0x7c/0xb0
> [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
>
Thanks!
I haven't seen the bug report.
Could you provide the link, please?
--
Best regards,
Stanislav Kinsbursky
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-17 5:05 ` Stanislav Kinsbursky
@ 2013-01-17 13:03 ` J. Bruce Fields
2013-01-17 13:24 ` Stanislav Kinsbursky
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2013-01-17 13:03 UTC (permalink / raw)
To: Stanislav Kinsbursky; +Cc: Mark Lord, linux-nfs, Linux Kernel
On Thu, Jan 17, 2013 at 09:05:51AM +0400, Stanislav Kinsbursky wrote:
> 17.01.2013 02:51, Mark Lord пишет:
> >On 13-01-16 12:20 AM, Stanislav Kinsbursky wrote:
> >>
> >>Mark, could you provide any call traces?
> >
> >Call traces from where/what?
> >There's this one, posted earlier in the BUG report:
> >
> >kernel BUG at net/sunrpc/svc_xprt.c:921!
> >Call Trace:
> > [<ffffffffa016a56a>] ? svc_recv+0xcc/0x338 [sunrpc]
> > [<ffffffffa0318bfc>] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
> > [<ffffffffa0318c19>] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
> > [<ffffffff810407e6>] ? kthread+0x81/0x89
> > [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
> > [<ffffffff812ea62c>] ? ret_from_fork+0x7c/0xb0
> > [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
> >
>
> Thanks!
> I haven't seen the bug report.
> Could you provide the link, please?
There's no bz if that's what you're asking for.
See the first message in the thread for the original report:
http://mid.gmane.org/<50F42F85.50907@teksavvy.com>
--b.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-17 13:03 ` J. Bruce Fields
@ 2013-01-17 13:24 ` Stanislav Kinsbursky
2013-01-17 23:41 ` Mark Lord
0 siblings, 1 reply; 22+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-17 13:24 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Mark Lord, linux-nfs, Linux Kernel
17.01.2013 17:03, J. Bruce Fields пишет:
> On Thu, Jan 17, 2013 at 09:05:51AM +0400, Stanislav Kinsbursky wrote:
>> 17.01.2013 02:51, Mark Lord пишет:
>>> On 13-01-16 12:20 AM, Stanislav Kinsbursky wrote:
>>>>
>>>> Mark, could you provide any call traces?
>>>
>>> Call traces from where/what?
>>> There's this one, posted earlier in the BUG report:
>>>
>>> kernel BUG at net/sunrpc/svc_xprt.c:921!
>>> Call Trace:
>>> [<ffffffffa016a56a>] ? svc_recv+0xcc/0x338 [sunrpc]
>>> [<ffffffffa0318bfc>] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
>>> [<ffffffffa0318c19>] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
>>> [<ffffffff810407e6>] ? kthread+0x81/0x89
>>> [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
>>> [<ffffffff812ea62c>] ? ret_from_fork+0x7c/0xb0
>>> [<ffffffff81040765>] ? kthread_freezable_should_stop+0x36/0x36
>>>
>>
>> Thanks!
>> I haven't seen the bug report.
>> Could you provide the link, please?
>
> There's no bz if that's what you're asking for.
>
> See the first message in the thread for the original report:
>
> http://mid.gmane.org/<50F42F85.50907@teksavvy.com>
>
Thanks, Bruce.
This looks like the old issue I was trying to fix with "SUNRPC: protect service sockets lists during per-net shutdown".
So, here is the problem as I see it: there is a transport, which is processed by service thread and it's processing is racing with per-net service shutdown:
CPU#0: CPU#1:
svc_recv svc_close_net
svc_get_next_xprt (list_del_init(xpt_ready))
svc_close_list (set XPT_BUSY and XPT_CLOSE)
svc_clear_pools(xprt was gained on CPU#0 already)
svc_delete_xprt (set XPT_DEAD)
svc_handle_xprt (is XPT_CLOSE => svc_delete_xprt()
BUG()
So, from my POW, we need some way to:
1) Skip such in-progress transports on svc_close_net() call (there is not way to detect them, or at least I don't see one)
2) Delete the transport after somewhere after svc_xprt_received()
But there is a problem with svc_xprt_received(): there is a call for svc_xprt_put() in it (svc_recv->svc_handle_xprt->svc_xprt_received->svc_xprt_put) . And if
we are the only user - then the transport will be destroyed. But transport is dereferenced later in svc_recv() after the svc_handle_xprt call.
What do you think, Bruce?
> --b.
>
--
Best regards,
Stanislav Kinsbursky
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-17 13:24 ` Stanislav Kinsbursky
@ 2013-01-17 23:41 ` Mark Lord
2013-01-18 5:37 ` Stanislav Kinsbursky
0 siblings, 1 reply; 22+ messages in thread
From: Mark Lord @ 2013-01-17 23:41 UTC (permalink / raw)
To: Stanislav Kinsbursky; +Cc: J. Bruce Fields, linux-nfs, Linux Kernel
On 13-01-17 08:24 AM, Stanislav Kinsbursky wrote:
..
> This looks like the old issue I was trying to fix with "SUNRPC: protect service sockets lists during
> per-net shutdown".
> So, here is the problem as I see it: there is a transport, which is processed by service thread and
> it's processing is racing with per-net service shutdown:
>
> CPU#0: CPU#1:
>
> svc_recv svc_close_net
> svc_get_next_xprt (list_del_init(xpt_ready))
> svc_close_list (set XPT_BUSY and XPT_CLOSE)
> svc_clear_pools(xprt was gained on CPU#0 already)
> svc_delete_xprt (set XPT_DEAD)
> svc_handle_xprt (is XPT_CLOSE => svc_delete_xprt()
> BUG()
>
> So, from my POW, we need some way to:
> 1) Skip such in-progress transports on svc_close_net() call (there is not way to detect them, or at
> least I don't see one)
> 2) Delete the transport after somewhere after svc_xprt_received()
>
> But there is a problem with svc_xprt_received(): there is a call for svc_xprt_put() in it
> (svc_recv->svc_handle_xprt->svc_xprt_received->svc_xprt_put) . And if we are the only user - then
> the transport will be destroyed. But transport is dereferenced later in svc_recv() after the
> svc_handle_xprt call.
Sounds like a reference count type of problem/solution (kref) (?)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-17 23:41 ` Mark Lord
@ 2013-01-18 5:37 ` Stanislav Kinsbursky
2013-01-18 15:48 ` Mark Lord
0 siblings, 1 reply; 22+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-18 5:37 UTC (permalink / raw)
To: Mark Lord; +Cc: J. Bruce Fields, linux-nfs, Linux Kernel
18.01.2013 03:41, Mark Lord пишет:
> On 13-01-17 08:24 AM, Stanislav Kinsbursky wrote:
> ..
>> This looks like the old issue I was trying to fix with "SUNRPC: protect service sockets lists during
>> per-net shutdown".
>> So, here is the problem as I see it: there is a transport, which is processed by service thread and
>> it's processing is racing with per-net service shutdown:
>>
>> CPU#0: CPU#1:
>>
>> svc_recv svc_close_net
>> svc_get_next_xprt (list_del_init(xpt_ready))
>> svc_close_list (set XPT_BUSY and XPT_CLOSE)
>> svc_clear_pools(xprt was gained on CPU#0 already)
>> svc_delete_xprt (set XPT_DEAD)
>> svc_handle_xprt (is XPT_CLOSE => svc_delete_xprt()
>> BUG()
>>
>> So, from my POW, we need some way to:
>> 1) Skip such in-progress transports on svc_close_net() call (there is not way to detect them, or at
>> least I don't see one)
>> 2) Delete the transport after somewhere after svc_xprt_received()
>>
>> But there is a problem with svc_xprt_received(): there is a call for svc_xprt_put() in it
>> (svc_recv->svc_handle_xprt->svc_xprt_received->svc_xprt_put) . And if we are the only user - then
>> the transport will be destroyed. But transport is dereferenced later in svc_recv() after the
>> svc_handle_xprt call.
>
> Sounds like a reference count type of problem/solution (kref) (?)
>
No, it would be very simple.
Unluckily, the problem is more complex. In few words, the problem is in dynamic resources (transports) creation/attaching
and destruction/detaching for running (!) SUNRPC service.
You have more than one NFS mount in different network namespaces, haven't you?
--
Best regards,
Stanislav Kinsbursky
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-18 5:37 ` Stanislav Kinsbursky
@ 2013-01-18 15:48 ` Mark Lord
2013-01-18 15:56 ` J. Bruce Fields
0 siblings, 1 reply; 22+ messages in thread
From: Mark Lord @ 2013-01-18 15:48 UTC (permalink / raw)
To: Stanislav Kinsbursky; +Cc: J. Bruce Fields, linux-nfs, Linux Kernel
On 13-01-18 12:37 AM, Stanislav Kinsbursky wrote:
>
> You have more than one NFS mount in different network namespaces, haven't you?
>
No, I don't (knowingly) use (multiple) namespaces at all.
Usually I disable them in the kernel .config,
though it appears the currently running kernel has this:
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
The full .config was attached to the first post in this thread.
Cheers
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-18 15:48 ` Mark Lord
@ 2013-01-18 15:56 ` J. Bruce Fields
2013-01-21 8:19 ` Stanislav Kinsbursky
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2013-01-18 15:56 UTC (permalink / raw)
To: Mark Lord; +Cc: Stanislav Kinsbursky, linux-nfs, Linux Kernel
On Fri, Jan 18, 2013 at 10:48:02AM -0500, Mark Lord wrote:
> On 13-01-18 12:37 AM, Stanislav Kinsbursky wrote:
> >
> > You have more than one NFS mount in different network namespaces, haven't you?
> >
>
> No, I don't (knowingly) use (multiple) namespaces at all.
Right, I don't think that's necessary. Stanislav, look at
nfs_callback_down:
nfs_callback_down_net(minorversion, cb_info->serv, net);
cb_info->users--;
if (cb_info->users == 0 && cb_info->task != NULL) {
kthread_stop(cb_info->task);
...
It's first destroying the service, then destroying the thread. That's
the wrong order. So we could still have the thread running svc_recv()
after the rpc service is destroyed.
--b.
> Usually I disable them in the kernel .config,
> though it appears the currently running kernel has this:
>
> CONFIG_NAMESPACES=y
> # CONFIG_UTS_NS is not set
> # CONFIG_IPC_NS is not set
> # CONFIG_PID_NS is not set
> # CONFIG_NET_NS is not set
>
> The full .config was attached to the first post in this thread.
>
> Cheers
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-18 15:56 ` J. Bruce Fields
@ 2013-01-21 8:19 ` Stanislav Kinsbursky
0 siblings, 0 replies; 22+ messages in thread
From: Stanislav Kinsbursky @ 2013-01-21 8:19 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Mark Lord, linux-nfs, Linux Kernel
18.01.2013 19:56, J. Bruce Fields пишет:
> On Fri, Jan 18, 2013 at 10:48:02AM -0500, Mark Lord wrote:
>> On 13-01-18 12:37 AM, Stanislav Kinsbursky wrote:
>>>
>>> You have more than one NFS mount in different network namespaces, haven't you?
>>>
>>
>> No, I don't (knowingly) use (multiple) namespaces at all.
>
> Right, I don't think that's necessary. Stanislav, look at
> nfs_callback_down:
>
> nfs_callback_down_net(minorversion, cb_info->serv, net);
> cb_info->users--;
> if (cb_info->users == 0 && cb_info->task != NULL) {
> kthread_stop(cb_info->task);
> ...
>
> It's first destroying the service, then destroying the thread. That's
> the wrong order. So we could still have the thread running svc_recv()
> after the rpc service is destroyed.
>
Sad, but no, this can't be done that easy in the way you are proposing. Have a look at lock_down_net() - it works in the same manner.
Moreover, service shutdown was significantly reworked to support work across multiple namespaces. We, actually, came to this solution in one of our previous
discussions in the past because we were trying to reduce number of running threads and used memory for such non-heavily used services like lockd and nfs
callback. And existent approach works good enough except in-progress transports.
Now we can't just move shutdown thread after transports because other problems will arise (BUG_ON() in svc_destroy will trigger and that's just the beginning).
I.e. to make it possible to shutdown service before transports, service should be rewritten to work in network namespace context (but not across all namespaces).
> --b.
>
>> Usually I disable them in the kernel .config,
>> though it appears the currently running kernel has this:
>>
>> CONFIG_NAMESPACES=y
>> # CONFIG_UTS_NS is not set
>> # CONFIG_IPC_NS is not set
>> # CONFIG_PID_NS is not set
>> # CONFIG_NET_NS is not set
>>
>> The full .config was attached to the first post in this thread.
>>
>> Cheers
>>
--
Best regards,
Stanislav Kinsbursky
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-14 16:17 Mark Lord
2013-01-14 20:37 ` J. Bruce Fields
@ 2013-01-17 13:11 ` Mark Lord
2013-01-17 13:53 ` J. Bruce Fields
1 sibling, 1 reply; 22+ messages in thread
From: Mark Lord @ 2013-01-17 13:11 UTC (permalink / raw)
To: J. Bruce Fields, linux-nfs, Linux Kernel
On 13-01-14 11:17 AM, Mark Lord wrote:
>
> Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:
>
> /*
> * Remove a dead transport
> */
> static void svc_delete_xprt(struct svc_xprt *xprt)
> {
> struct svc_serv *serv = xprt->xpt_server;
> struct svc_deferred_req *dr;
>
> /* Only do this once */
> if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
> BUG();
Shouldn't there also be a return statement after the BUG() line,
inside the if-stmt ?
I mean, the comment says "only do this once", but it actually
appears to end up doing it twice, despite the test.
??
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-17 13:11 ` Mark Lord
@ 2013-01-17 13:53 ` J. Bruce Fields
2013-01-17 23:40 ` Mark Lord
2013-02-25 20:45 ` Mark Lord
0 siblings, 2 replies; 22+ messages in thread
From: J. Bruce Fields @ 2013-01-17 13:53 UTC (permalink / raw)
To: Mark Lord; +Cc: linux-nfs, Linux Kernel
On Thu, Jan 17, 2013 at 08:11:52AM -0500, Mark Lord wrote:
> On 13-01-14 11:17 AM, Mark Lord wrote:
> >
> > Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:
> >
> > /*
> > * Remove a dead transport
> > */
> > static void svc_delete_xprt(struct svc_xprt *xprt)
> > {
> > struct svc_serv *serv = xprt->xpt_server;
> > struct svc_deferred_req *dr;
> >
> > /* Only do this once */
> > if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
> > BUG();
>
>
> Shouldn't there also be a return statement after the BUG() line,
> inside the if-stmt ?
BUG() kills the thread that calls it, so it never returns, and a
following statement wouldn't be executed--I may not understand your
question.
--b.
>
> I mean, the comment says "only do this once", but it actually
> appears to end up doing it twice, despite the test.
>
> ??
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-17 13:53 ` J. Bruce Fields
@ 2013-01-17 23:40 ` Mark Lord
2013-02-25 20:45 ` Mark Lord
1 sibling, 0 replies; 22+ messages in thread
From: Mark Lord @ 2013-01-17 23:40 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs, Linux Kernel
On 13-01-17 08:53 AM, J. Bruce Fields wrote:
> On Thu, Jan 17, 2013 at 08:11:52AM -0500, Mark Lord wrote:
>> On 13-01-14 11:17 AM, Mark Lord wrote:
>>>
>>> Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:
>>>
>>> /*
>>> * Remove a dead transport
>>> */
>>> static void svc_delete_xprt(struct svc_xprt *xprt)
>>> {
>>> struct svc_serv *serv = xprt->xpt_server;
>>> struct svc_deferred_req *dr;
>>>
>>> /* Only do this once */
>>> if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
>>> BUG();
>>
>>
>> Shouldn't there also be a return statement after the BUG() line,
>> inside the if-stmt ?
>
> BUG() kills the thread that calls it
Oh, does it? Well, taken care of then, I guess.
With a sledgehammer.
:)
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-01-17 13:53 ` J. Bruce Fields
2013-01-17 23:40 ` Mark Lord
@ 2013-02-25 20:45 ` Mark Lord
2013-02-25 20:52 ` J. Bruce Fields
1 sibling, 1 reply; 22+ messages in thread
From: Mark Lord @ 2013-02-25 20:45 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Mark Lord, linux-nfs, Linux Kernel
On 13-01-17 08:53 AM, J. Bruce Fields wrote:
> On Thu, Jan 17, 2013 at 08:11:52AM -0500, Mark Lord wrote:
>> On 13-01-14 11:17 AM, Mark Lord wrote:
>>>
>>> Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:
>>>
>>> /*
>>> * Remove a dead transport
>>> */
>>> static void svc_delete_xprt(struct svc_xprt *xprt)
>>> {
>>> struct svc_serv *serv = xprt->xpt_server;
>>> struct svc_deferred_req *dr;
>>>
>>> /* Only do this once */
>>> if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
>>> BUG();
>>
Saw this again today on 3.7.9 -- dunno if your changes are in that kernel yet though.
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: BUG at net/sunrpc/svc_xprt.c:921
2013-02-25 20:45 ` Mark Lord
@ 2013-02-25 20:52 ` J. Bruce Fields
0 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2013-02-25 20:52 UTC (permalink / raw)
To: Mark Lord; +Cc: Mark Lord, linux-nfs, Linux Kernel
On Mon, Feb 25, 2013 at 03:45:07PM -0500, Mark Lord wrote:
> On 13-01-17 08:53 AM, J. Bruce Fields wrote:
> > On Thu, Jan 17, 2013 at 08:11:52AM -0500, Mark Lord wrote:
> >> On 13-01-14 11:17 AM, Mark Lord wrote:
> >>>
> >>> Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:
> >>>
> >>> /*
> >>> * Remove a dead transport
> >>> */
> >>> static void svc_delete_xprt(struct svc_xprt *xprt)
> >>> {
> >>> struct svc_serv *serv = xprt->xpt_server;
> >>> struct svc_deferred_req *dr;
> >>>
> >>> /* Only do this once */
> >>> if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
> >>> BUG();
> >>
>
> Saw this again today on 3.7.9 -- dunno if your changes are in that kernel yet though.
Nope. The nfsd changes for 3.9 should get merged in a few days and then
backported to stable kernels not much later.
--b.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2013-02-25 20:52 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-07 12:56 BUG at net/sunrpc/svc_xprt.c:921 Tom Horsley
2013-02-08 20:58 ` J. Bruce Fields
-- strict thread matches above, loose matches on Subject: below --
2013-01-14 16:17 Mark Lord
2013-01-14 20:37 ` J. Bruce Fields
2013-01-15 4:16 ` Mark Lord
2013-01-15 20:56 ` J. Bruce Fields
2013-01-16 5:20 ` Stanislav Kinsbursky
2013-01-16 22:51 ` Mark Lord
2013-01-16 22:58 ` Mark Lord
2013-01-17 5:05 ` Stanislav Kinsbursky
2013-01-17 13:03 ` J. Bruce Fields
2013-01-17 13:24 ` Stanislav Kinsbursky
2013-01-17 23:41 ` Mark Lord
2013-01-18 5:37 ` Stanislav Kinsbursky
2013-01-18 15:48 ` Mark Lord
2013-01-18 15:56 ` J. Bruce Fields
2013-01-21 8:19 ` Stanislav Kinsbursky
2013-01-17 13:11 ` Mark Lord
2013-01-17 13:53 ` J. Bruce Fields
2013-01-17 23:40 ` Mark Lord
2013-02-25 20:45 ` Mark Lord
2013-02-25 20:52 ` J. Bruce Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).