* Kernel (2.6.24) crash on nfsd (BUG: soft lockup)
@ 2008-02-26 15:48 Gertjan Oude Lohuis
[not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Gertjan Oude Lohuis @ 2008-02-26 15:48 UTC (permalink / raw)
To: linux-nfs
[-- Attachment #1: Type: text/plain, Size: 722 bytes --]
Hi!
One of our fileservers went down pretty hard yesterday. We recently
upgraded the kernel to 2.6.24 because we suffered from the lockd-lockup
with our previous kernel (2.6.18).
The server stopped responding completely to any requests (nfs, ssh,
ping) and every few seconds a stacktrace was dumped on the console. The
stacktraces hint at nfsd (Pid: 2716, comm: nfsd Not tainted
(2.6.24.2-fwsh-byte #2) and various nfs-functions in the trace). I
attached some of them to this message.
Do these stacktraces seem familiar to anyone? I couldn't find any
similar crashes with google.
--
Met vriendelijke groet,
Gertjan Oude Lohuis
Byte Internet
W www.byte.nl
E support-DW70C6hi67U@public.gmane.org
F 020 6255 922
[-- Attachment #2: stacktrace.txt --]
[-- Type: text/plain, Size: 4973 bytes --]
BUG: soft lockup - CPU#0 stuck for 11s! [nfsd:2716]
Pid: 2716, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
EIP is at find_get_pages_contig+0x67/0x73
EAX: 00000000 EBX: 00000001 ECX: c25cc520 EDX: c25cc520
ESI: 00000078 EDI: ca2fbdbc EBP: 00000001 ESP: dffb5c6c
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7f5d000 CR3: 1fc45000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c017c49c>] __generic_file_splice_read+0xa2/0x41e
[<c0113b11>] sched_slice+0x15/0x6f
[<c0131291>] getnstimeofday+0x31/0x105
[<c0134301>] clockevents_program_event+0xbf/0x134
[<c012ef49>] ktime_get_ts+0x15/0x47
[<c01231ea>] run_timer_softirq+0x30/0x184
[<c012a893>] __rcu_process_callbacks+0x76/0xbb
[<c011f979>] tasklet_action+0x53/0x93
[<c011f754>] __do_softirq+0xba/0xcf
[<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
[<c01032e0>] apic_timer_interrupt+0x28/0x30
[<c017c88d>] generic_file_splice_read+0x75/0xc9
[<c017d083>] do_splice_to+0x6e/0x90
[<c017d144>] splice_direct_to_actor+0x9f/0x166
[<f8f2cf72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
[<c017c818>] generic_file_splice_read+0x0/0xc9
[<f8f2d309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
[<f8f2b3b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
[<c016014f>] dentry_open+0x34/0x64
[<f8f2d73c>] nfsd_read+0xee/0xfb [nfsd]
[<f8f33b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
[<f8f354cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
[<f8f29855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
[<c0445ab2>] svcauth_unix_set_client+0x116/0x165
[<c0441ad1>] svc_process+0x4e9/0x6b4
[<c01168e2>] default_wake_function+0x0/0x8
[<f8f2963d>] nfsd+0x16a/0x290 [nfsd]
[<f8f294d3>] nfsd+0x0/0x290 [nfsd]
[<c0103463>] kernel_thread_helper+0x7/0x10
=======================
BUG: soft lockup - CPU#0 stuck for 11s! [nfsd:2716]
Pid: 2716, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
EIP is at find_get_pages_contig+0x67/0x73
EAX: 00000000 EBX: 00000001 ECX: c25cc520 EDX: c25cc520
ESI: 00000078 EDI: ca2fbdbc EBP: 00000001 ESP: dffb5c6c
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7f5d000 CR3: 1fc45000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c017c49c>] __generic_file_splice_read+0xa2/0x41e
[<c0132efc>] clocksource_get_next+0x3a/0x40
[<c0113b11>] sched_slice+0x15/0x6f
[<c0131291>] getnstimeofday+0x31/0x105
[<c0134301>] clockevents_program_event+0xbf/0x134
[<c012ef49>] ktime_get_ts+0x15/0x47
[<c01231ea>] run_timer_softirq+0x30/0x184
[<c012a893>] __rcu_process_callbacks+0x76/0xbb
[<c011f979>] tasklet_action+0x53/0x93
[<c011f754>] __do_softirq+0xba/0xcf
[<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
[<c01032e0>] apic_timer_interrupt+0x28/0x30
[<c017007b>] locks_show+0x0/0x67
[<c017c88d>] generic_file_splice_read+0x75/0xc9
[<c017d083>] do_splice_to+0x6e/0x90
[<c017d144>] splice_direct_to_actor+0x9f/0x166
[<f8f2cf72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
[<c017c818>] generic_file_splice_read+0x0/0xc9
[<f8f2d309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
[<f8f2b3b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
[<c016014f>] dentry_open+0x34/0x64
[<f8f2d73c>] nfsd_read+0xee/0xfb [nfsd]
[<f8f33b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
[<f8f354cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
[<f8f29855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
[<c0445ab2>] svcauth_unix_set_client+0x116/0x165
[<c0441ad1>] svc_process+0x4e9/0x6b4
[<c01168e2>] default_wake_function+0x0/0x8
[<f8f2963d>] nfsd+0x16a/0x290 [nfsd]
[<f8f294d3>] nfsd+0x0/0x290 [nfsd]
[<c0103463>] kernel_thread_helper+0x7/0x10
=======================
BUG: soft lockup - CPU#0 stuck for 11s! [nfsd:2716]
Pid: 2716, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c017c88f>] EFLAGS: 00000246 CPU: 0
EIP is at generic_file_splice_read+0x77/0xc9
EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00001000 ESP: dffb5df0
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7f5d000 CR3: 1fc45000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c017d083>] do_splice_to+0x6e/0x90
[<c017d144>] splice_direct_to_actor+0x9f/0x166
[<f8f2cf72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
[<c017c818>] generic_file_splice_read+0x0/0xc9
[<f8f2d309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
[<f8f2b3b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
[<c016014f>] dentry_open+0x34/0x64
[<f8f2d73c>] nfsd_read+0xee/0xfb [nfsd]
[<f8f33b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
[<f8f354cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
[<f8f29855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
[<c0445ab2>] svcauth_unix_set_client+0x116/0x165
[<c0441ad1>] svc_process+0x4e9/0x6b4
[<c01168e2>] default_wake_function+0x0/0x8
[<f8f2963d>] nfsd+0x16a/0x290 [nfsd]
[<f8f294d3>] nfsd+0x0/0x290 [nfsd]
[<c0103463>] kernel_thread_helper+0x7/0x10
=======================
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <47C434D2.80601-DW70C6hi67U@public.gmane.org>]
* Re: Kernel (2.6.24) crash on nfsd (BUG: soft lockup) [not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org> @ 2008-02-27 6:46 ` Gertjan Oude Lohuis [not found] ` <47C50754.5030107-DW70C6hi67U@public.gmane.org> 2008-02-28 11:08 ` Kernel (2.6.24) crash on nfsd (BUG: soft lockup) Gertjan Oude Lohuis 1 sibling, 1 reply; 8+ messages in thread From: Gertjan Oude Lohuis @ 2008-02-27 6:46 UTC (permalink / raw) To: linux-nfs [-- Attachment #1: Type: text/plain, Size: 918 bytes --] Gertjan Oude Lohuis wrote: > One of our fileservers went down pretty hard yesterday. We recently > upgraded the kernel to 2.6.24 because we suffered from the > lockd-lockup with our previous kernel (2.6.18). > The server stopped responding completely to any requests (nfs, ssh, > ping) and every few seconds a stacktrace was dumped on the console. > The stacktraces hint at nfsd (Pid: 2716, comm: nfsd Not tainted > (2.6.24.2-fwsh-byte #2) and various nfs-functions in the trace). I > attached some of them to this message. This morning the same server crashed again, with the same stacktrace (at least to my eyes :-)). I think we'll be downgrading to 2.6.23 as soon as possible. Is there anything I can do to get more debug information? Now or when it crashes? When the server crashes, I'm able to logging to it with the serial console, and reboot it with 'send break -> b'. Regards, Gertjan Oude Lohuis [-- Attachment #2: stacktrace.txt --] [-- Type: text/plain, Size: 3610 bytes --] BUG: soft lockup - CPU#2 stuck for 11s! [nfsd:2775] Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2) EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 2 EIP is at find_get_pages_contig+0x67/0x73 EAX: 00000000 EBX: 00000002 ECX: c235c8e0 EDX: c235c8e0 ESI: 00000110 EDI: d7f2c69c EBP: 00000002 ESP: f604fc6c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7fb8000 CR3: 355be000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c017c49c>] __generic_file_splice_read+0xa2/0x41e [<c0113b11>] sched_slice+0x15/0x6f [<c0131291>] getnstimeofday+0x31/0x105 [<c0134301>] clockevents_program_event+0xbf/0x134 [<c012ef49>] ktime_get_ts+0x15/0x47 [<c01231ea>] run_timer_softirq+0x30/0x184 [<c012a893>] __rcu_process_callbacks+0x76/0xbb [<c011f979>] tasklet_action+0x53/0x93 [<c011f754>] __do_softirq+0xba/0xcf [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35 [<c01032e0>] apic_timer_interrupt+0x28/0x30 [<c017c88d>] generic_file_splice_read+0x75/0xc9 [<c017d083>] do_splice_to+0x6e/0x90 [<c017d144>] splice_direct_to_actor+0x9f/0x166 [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd] [<c017c818>] generic_file_splice_read+0x0/0xc9 [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd] [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd] [<c016014f>] dentry_open+0x34/0x64 [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd] [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd] [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd] [<c0445ab2>] svcauth_unix_set_client+0x116/0x165 [<c0441ad1>] svc_process+0x4e9/0x6b4 [<c01168e2>] default_wake_function+0x0/0x8 [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd] [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd] [<c0103463>] kernel_thread_helper+0x7/0x10 ======================= BUG: soft lockup - CPU#2 stuck for 11s! [nfsd:2775] Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2) EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 2 EIP is at find_get_pages_contig+0x67/0x73 EAX: 00000000 EBX: 00000002 ECX: c235c8e0 EDX: c235c8e0 ESI: 00000110 EDI: d7f2c69c EBP: 00000002 ESP: f604fc6c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7fb8000 CR3: 355be000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c017c49c>] __generic_file_splice_read+0xa2/0x41e [<c0113b11>] sched_slice+0x15/0x6f [<c0131291>] getnstimeofday+0x31/0x105 [<c0134301>] clockevents_program_event+0xbf/0x134 [<c012ef49>] ktime_get_ts+0x15/0x47 [<c01231ea>] run_timer_softirq+0x30/0x184 [<c012a893>] __rcu_process_callbacks+0x76/0xbb [<c011f979>] tasklet_action+0x53/0x93 [<c011f754>] __do_softirq+0xba/0xcf [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35 [<c01032e0>] apic_timer_interrupt+0x28/0x30 [<c017c88d>] generic_file_splice_read+0x75/0xc9 [<c017d083>] do_splice_to+0x6e/0x90 [<c017d144>] splice_direct_to_actor+0x9f/0x166 [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd] [<c017c818>] generic_file_splice_read+0x0/0xc9 [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd] [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd] [<c016014f>] dentry_open+0x34/0x64 [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd] [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd] [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd] [<c0445ab2>] svcauth_unix_set_client+0x116/0x165 [<c0441ad1>] svc_process+0x4e9/0x6b4 [<c01168e2>] default_wake_function+0x0/0x8 [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd] [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd] [<c0103463>] kernel_thread_helper+0x7/0x10 ======================= ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <47C50754.5030107-DW70C6hi67U@public.gmane.org>]
* Re: Kernel (2.6.24) crash on nfsd (BUG: soft lockup) [not found] ` <47C50754.5030107-DW70C6hi67U@public.gmane.org> @ 2008-02-27 7:01 ` Gertjan Oude Lohuis [not found] ` <47C50ABB.8050700-DW70C6hi67U@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Gertjan Oude Lohuis @ 2008-02-27 7:01 UTC (permalink / raw) To: linux-nfs [-- Attachment #1: Type: text/plain, Size: 843 bytes --] Gertjan Oude Lohuis wrote: > This morning the same server crashed again, with the same stacktrace > (at least to my eyes :-)). I think we'll be downgrading to 2.6.23 as > soon as possible. Is there anything I can do to get more debug > information? Now or when it crashes? When the server crashes, I'm able > to logging to it with the serial console, and reboot it with 'send > break -> b'. This keeps getting weirder. When browsing the servers logfiles, I noticed that the server has exactly the same errors in /var/log/messages yesterday night, around 1:52 AM. However, the server did not crash then. We didn't notice earlier, because most notifications are suppressed during the night. Apparently, Linux can recover from this bug, given enough time. What expert can help me understand this problem? Regards, Gertjan Oude Lohuis [-- Attachment #2: stacktrace2.txt --] [-- Type: text/plain, Size: 12059 bytes --] Feb 26 01:52:00 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775] Feb 26 01:52:00 file1 kernel: Feb 26 01:52:00 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2) Feb 26 01:52:00 file1 kernel: EIP: 0060:[<c0147be0>] EFLAGS: 00000246 CPU: 3 Feb 26 01:52:00 file1 kernel: EIP is at put_page+0x9/0x20 Feb 26 01:52:00 file1 kernel: EAX: 80000008 EBX: 00000000 ECX: 00000002 EDX: c2a71240 Feb 26 01:52:00 file1 kernel: ESI: 00000000 EDI: e6ee08fc EBP: 00000087 ESP: f604fc7c Feb 26 01:52:00 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Feb 26 01:52:00 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0 Feb 26 01:52:00 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Feb 26 01:52:00 file1 kernel: DR6: ffff0ff0 DR7: 00000400 Feb 26 01:52:00 file1 kernel: [<c017c6bc>] __generic_file_splice_read+0x2c2/0x41e Feb 26 01:52:00 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f Feb 26 01:52:00 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105 Feb 26 01:52:01 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134 Feb 26 01:52:01 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47 Feb 26 01:52:01 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184 Feb 26 01:52:01 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb Feb 26 01:52:01 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93 Feb 26 01:52:01 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf Feb 26 01:52:01 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35 Feb 26 01:52:01 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30 Feb 26 01:52:01 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9 Feb 26 01:52:01 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90 Feb 26 01:52:01 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166 Feb 26 01:52:01 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd] Feb 26 01:52:01 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9 Feb 26 01:52:01 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd] Feb 26 01:52:01 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd] Feb 26 01:52:01 file1 kernel: [<c016014f>] dentry_open+0x34/0x64 Feb 26 01:52:01 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd] Feb 26 01:52:01 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd] Feb 26 01:52:01 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] Feb 26 01:52:01 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd] Feb 26 01:52:01 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165 Feb 26 01:52:02 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4 Feb 26 01:52:02 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8 Feb 26 01:52:02 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd] Feb 26 01:52:02 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd] Feb 26 01:52:02 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10 Feb 26 01:52:02 file1 kernel: ======================= Feb 26 01:52:14 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775] Feb 26 01:52:14 file1 kernel: Feb 26 01:52:14 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2) Feb 26 01:52:14 file1 kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 3 Feb 26 01:52:14 file1 kernel: EIP is at find_get_pages_contig+0x67/0x73 Feb 26 01:52:14 file1 kernel: EAX: 00000000 EBX: 00000002 ECX: c2a71260 EDX: c2a71260 Feb 26 01:52:14 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc6c Feb 26 01:52:14 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Feb 26 01:52:14 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0 Feb 26 01:52:14 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Feb 26 01:52:14 file1 kernel: DR6: ffff0ff0 DR7: 00000400 Feb 26 01:52:14 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e Feb 26 01:52:14 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f Feb 26 01:52:14 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105 Feb 26 01:52:14 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134 Feb 26 01:52:14 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47 Feb 26 01:52:14 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184 Feb 26 01:52:14 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb Feb 26 01:52:14 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93 Feb 26 01:52:14 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf Feb 26 01:52:14 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35 Feb 26 01:52:15 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30 Feb 26 01:52:15 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9 Feb 26 01:52:15 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90 Feb 26 01:52:15 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166 Feb 26 01:52:15 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd] Feb 26 01:52:15 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9 Feb 26 01:52:15 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd] Feb 26 01:52:15 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd] Feb 26 01:52:15 file1 kernel: [<c016014f>] dentry_open+0x34/0x64 Feb 26 01:52:15 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd] Feb 26 01:52:15 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd] Feb 26 01:52:15 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] Feb 26 01:52:15 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd] Feb 26 01:52:15 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165 Feb 26 01:52:15 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4 Feb 26 01:52:15 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8 Feb 26 01:52:15 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd] Feb 26 01:52:15 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd] Feb 26 01:52:15 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10 Feb 26 01:52:15 file1 kernel: ======================= Feb 26 01:52:27 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775] Feb 26 01:52:27 file1 kernel: Feb 26 01:52:27 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2) Feb 26 01:52:27 file1 kernel: EIP: 0060:[<c014096a>] EFLAGS: 00000286 CPU: 3 Feb 26 01:52:27 file1 kernel: EIP is at find_get_pages_contig+0x6a/0x73 Feb 26 01:52:27 file1 kernel: EAX: 00000002 EBX: 00000002 ECX: c2a71260 EDX: c2a71260 Feb 26 01:52:28 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc70 Feb 26 01:52:28 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Feb 26 01:52:28 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0 Feb 26 01:52:28 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Feb 26 01:52:28 file1 kernel: DR6: ffff0ff0 DR7: 00000400 Feb 26 01:52:28 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e Feb 26 01:52:28 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f Feb 26 01:52:28 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105 Feb 26 01:52:28 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134 Feb 26 01:52:28 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47 Feb 26 01:52:28 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184 Feb 26 01:52:28 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb Feb 26 01:52:28 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93 Feb 26 01:52:28 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf Feb 26 01:52:28 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35 Feb 26 01:52:28 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30 Feb 26 01:52:28 file1 kernel: [<c01700d8>] locks_show+0x5d/0x67 Feb 26 01:52:28 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9 Feb 26 01:52:28 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90 Feb 26 01:52:28 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166 Feb 26 01:52:29 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd] Feb 26 01:52:29 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9 Feb 26 01:52:29 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd] Feb 26 01:52:29 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd] Feb 26 01:52:29 file1 kernel: [<c016014f>] dentry_open+0x34/0x64 Feb 26 01:52:29 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd] Feb 26 01:52:29 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd] Feb 26 01:52:29 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] Feb 26 01:52:29 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd] Feb 26 01:52:29 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165 Feb 26 01:52:29 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4 Feb 26 01:52:29 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8 Feb 26 01:52:29 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd] Feb 26 01:52:29 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd] Feb 26 01:52:29 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10 Feb 26 01:52:29 file1 kernel: ======================= Feb 26 01:52:41 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775] Feb 26 01:52:41 file1 kernel: Feb 26 01:52:41 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2) Feb 26 01:52:41 file1 kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 3 Feb 26 01:52:41 file1 kernel: EIP is at find_get_pages_contig+0x67/0x73 Feb 26 01:52:41 file1 kernel: EAX: 00000000 EBX: 00000002 ECX: c2a71260 EDX: c2a71260 Feb 26 01:52:41 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc6c Feb 26 01:52:41 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Feb 26 01:52:41 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0 Feb 26 01:52:41 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Feb 26 01:52:42 file1 kernel: DR6: ffff0ff0 DR7: 00000400 Feb 26 01:52:42 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e Feb 26 01:52:42 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f Feb 26 01:52:42 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105 Feb 26 01:52:42 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134 Feb 26 01:52:42 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47 Feb 26 01:52:42 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184 Feb 26 01:52:42 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb Feb 26 01:52:42 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93 Feb 26 01:52:42 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf Feb 26 01:52:42 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35 Feb 26 01:52:42 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30 Feb 26 01:52:42 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9 Feb 26 01:52:42 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90 Feb 26 01:52:42 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166 Feb 26 01:52:42 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd] Feb 26 01:52:42 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9 Feb 26 01:52:42 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd] Feb 26 01:52:42 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd] Feb 26 01:52:42 file1 kernel: [<c016014f>] dentry_open+0x34/0x64 Feb 26 01:52:43 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd] Feb 26 01:52:43 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd] Feb 26 01:52:43 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] Feb 26 01:52:43 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd] Feb 26 01:52:43 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165 Feb 26 01:52:43 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4 Feb 26 01:52:43 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8 Feb 26 01:52:43 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd] Feb 26 01:52:43 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd] Feb 26 01:52:43 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10 Feb 26 01:52:43 file1 kernel: ======================= ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <47C50ABB.8050700-DW70C6hi67U@public.gmane.org>]
* Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)) [not found] ` <47C50ABB.8050700-DW70C6hi67U@public.gmane.org> @ 2008-02-28 10:56 ` Allard Hoeve [not found] ` <Pine.LNX.4.62.0802281153040.31013-FHjt3+7qfYHBZBx2VKNGNcSTQT6m/s+e@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Allard Hoeve @ 2008-02-28 10:56 UTC (permalink / raw) To: linux-nfs Dear Mailinglist, After trying 2.6.23.17, the same happened. The stacktrace is a bit different, but they are comparable. Is this an NFS problem in the first place? Where could we go for help with this problem? Regards, Allard Hoeve Pid: 2643, comm: nfsd EIP: 0060:[<c0179a3a>] CPU: 3 EIP is at __generic_file_splice_read+0x12c/0x418 EFLAGS: 00000206 Not tainted (2.6.23.17-fwsh-byte #3) EAX: f6e9dddc EBX: 00001000 ECX: 00000001 EDX: 00000000 ESI: 00000000 EDI: f6e9dcd0 EBP: 00000095 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: b7e72cc0 CR3: 00622000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0113cbb>] __check_preempt_curr_fair+0x4b/0x7d [<c0113dcb>] entity_tick+0x47/0x54 [<c013006b>] getnstimeofday+0x37/0x111 [<c0132fb6>] clockevents_program_event+0xac/0xcc [<c0122996>] run_timer_softirq+0x30/0x184 [<c012f36f>] hrtimer_interrupt+0x132/0x1c4 [<c011f0e0>] __do_softirq+0xba/0xcf [<c010da6a>] smp_apic_timer_interrupt+0x2c/0x35 [<c01032bc>] apic_timer_interrupt+0x28/0x30 [<c0179da7>] generic_file_splice_read+0x81/0xd5 [<c017a6b0>] do_splice_to+0x75/0x97 [<c017a771>] splice_direct_to_actor+0x9f/0x166 [<f8f2a494>] nfsd_acceptable+0x0/0xd1 [nfsd] [<f8f2c247>] nfsd_direct_splice_actor+0x0/0xa [nfsd] [<f8f2c5ea>] nfsd_vfs_read+0x399/0x3bd [nfsd] [<c015d57f>] dentry_open+0x34/0x64 [<f8f2ca1d>] nfsd_read+0xee/0xfb [nfsd] [<f8f332ab>] nfsd3_proc_read+0xfe/0x186 [nfsd] [<f8f34cd9>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] [<f8f28847>] nfsd_dispatch+0xc5/0x1ca [nfsd] [<c043ab82>] svcauth_unix_set_client+0x116/0x165 [<c0436b96>] svc_process+0x4fb/0x6d4 [<c01164ad>] default_wake_function+0x0/0xc [<f8f2863d>] nfsd+0x16a/0x282 [nfsd] [<f8f284d3>] nfsd+0x0/0x282 [nfsd] [<c010343f>] kernel_thread_helper+0x7/0x10 ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <Pine.LNX.4.62.0802281153040.31013-FHjt3+7qfYHBZBx2VKNGNcSTQT6m/s+e@public.gmane.org>]
* Re: Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)) [not found] ` <Pine.LNX.4.62.0802281153040.31013-FHjt3+7qfYHBZBx2VKNGNcSTQT6m/s+e@public.gmane.org> @ 2008-03-01 16:39 ` J. Bruce Fields 2008-03-01 17:03 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: J. Bruce Fields @ 2008-03-01 16:39 UTC (permalink / raw) To: Allard Hoeve; +Cc: linux-nfs, Nilssen, Rune, Gertjan Oude Lohuis, Jens Axboe On Thu, Feb 28, 2008 at 11:56:51AM +0100, Allard Hoeve wrote: > After trying 2.6.23.17, the same happened. The stacktrace is a bit > different, but they are comparable. > > Is this an NFS problem in the first place? Where could we go for help > with this problem? Thanks for the reports! So, the summary: several people are reporting soft lockup warnings with _generic_file_splice_read as the latest or next-to-latest function on the stack. Sounds like 2.6.18 is good, various kernels around 2.6.23 and 2.6.24 are reported bad. Is it possible this was a regression introduced by the splice changes? --b. > > Regards, > > Allard Hoeve > > > > Pid: 2643, comm: nfsd > EIP: 0060:[<c0179a3a>] CPU: 3 > EIP is at __generic_file_splice_read+0x12c/0x418 > EFLAGS: 00000206 Not tainted (2.6.23.17-fwsh-byte #3) > EAX: f6e9dddc EBX: 00001000 ECX: 00000001 EDX: 00000000 > ESI: 00000000 EDI: f6e9dcd0 EBP: 00000095 DS: 007b ES: 007b FS: 00d8 > CR0: 8005003b CR2: b7e72cc0 CR3: 00622000 CR4: 000006f0 > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > DR6: ffff0ff0 DR7: 00000400 > [<c0113cbb>] __check_preempt_curr_fair+0x4b/0x7d > [<c0113dcb>] entity_tick+0x47/0x54 > [<c013006b>] getnstimeofday+0x37/0x111 > [<c0132fb6>] clockevents_program_event+0xac/0xcc > [<c0122996>] run_timer_softirq+0x30/0x184 > [<c012f36f>] hrtimer_interrupt+0x132/0x1c4 > [<c011f0e0>] __do_softirq+0xba/0xcf > [<c010da6a>] smp_apic_timer_interrupt+0x2c/0x35 > [<c01032bc>] apic_timer_interrupt+0x28/0x30 > [<c0179da7>] generic_file_splice_read+0x81/0xd5 > [<c017a6b0>] do_splice_to+0x75/0x97 > [<c017a771>] splice_direct_to_actor+0x9f/0x166 > [<f8f2a494>] nfsd_acceptable+0x0/0xd1 [nfsd] > [<f8f2c247>] nfsd_direct_splice_actor+0x0/0xa [nfsd] > [<f8f2c5ea>] nfsd_vfs_read+0x399/0x3bd [nfsd] > [<c015d57f>] dentry_open+0x34/0x64 > [<f8f2ca1d>] nfsd_read+0xee/0xfb [nfsd] > [<f8f332ab>] nfsd3_proc_read+0xfe/0x186 [nfsd] > [<f8f34cd9>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] > [<f8f28847>] nfsd_dispatch+0xc5/0x1ca [nfsd] > [<c043ab82>] svcauth_unix_set_client+0x116/0x165 > [<c0436b96>] svc_process+0x4fb/0x6d4 > [<c01164ad>] default_wake_function+0x0/0xc > [<f8f2863d>] nfsd+0x16a/0x282 [nfsd] > [<f8f284d3>] nfsd+0x0/0x282 [nfsd] > [<c010343f>] kernel_thread_helper+0x7/0x10 > > - > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)) 2008-03-01 16:39 ` J. Bruce Fields @ 2008-03-01 17:03 ` Jens Axboe 2008-03-05 10:25 ` Gertjan Oude Lohuis 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2008-03-01 17:03 UTC (permalink / raw) To: J. Bruce Fields Cc: Allard Hoeve, linux-nfs, Nilssen, Rune, Gertjan Oude Lohuis On Sat, Mar 01 2008, J. Bruce Fields wrote: > On Thu, Feb 28, 2008 at 11:56:51AM +0100, Allard Hoeve wrote: > > After trying 2.6.23.17, the same happened. The stacktrace is a bit > > different, but they are comparable. > > > > Is this an NFS problem in the first place? Where could we go for help > > with this problem? > > Thanks for the reports! > > So, the summary: several people are reporting soft lockup warnings with > _generic_file_splice_read as the latest or next-to-latest function on > the stack. Sounds like 2.6.18 is good, various kernels around 2.6.23 > and 2.6.24 are reported bad. Is it possible this was a regression > introduced by the splice changes? I posted this two days ago, but didn't get a reply back regarding if anyone who can reproduce tested it? diff --git a/fs/splice.c b/fs/splice.c index 9b559ee..0254ec6 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -370,8 +370,10 @@ __generic_file_splice_read(struct file *in, loff_t *ppos, * for an in-flight io page */ if (flags & SPLICE_F_NONBLOCK) { - if (TestSetPageLocked(page)) + if (TestSetPageLocked(page)) { + error = -EAGAIN; break; + } } else lock_page(page); @@ -479,9 +481,8 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { - ssize_t spliced; - int ret; loff_t isize, left; + int ret; isize = i_size_read(in->f_mapping->host); if (unlikely(*ppos >= isize)) @@ -491,29 +492,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, if (unlikely(left < len)) len = left; - ret = 0; - spliced = 0; - while (len && !spliced) { - ret = __generic_file_splice_read(in, ppos, pipe, len, flags); - - if (ret < 0) - break; - else if (!ret) { - if (spliced) - break; - if (flags & SPLICE_F_NONBLOCK) { - ret = -EAGAIN; - break; - } - } - + ret = __generic_file_splice_read(in, ppos, pipe, len, flags); + if (ret > 0) *ppos += ret; - len -= ret; - spliced += ret; - } - - if (spliced) - return spliced; return ret; } -- Jens Axboe ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)) 2008-03-01 17:03 ` Jens Axboe @ 2008-03-05 10:25 ` Gertjan Oude Lohuis 0 siblings, 0 replies; 8+ messages in thread From: Gertjan Oude Lohuis @ 2008-03-05 10:25 UTC (permalink / raw) To: Jens Axboe; +Cc: J. Bruce Fields, Allard Hoeve, linux-nfs, Nilssen, Rune Hi Jens et al, On 03/01/2008 06:03 PM, Jens Axboe wrote: > On Sat, Mar 01 2008, J. Bruce Fields wrote: >> So, the summary: several people are reporting soft lockup warnings with >> _generic_file_splice_read as the latest or next-to-latest function on >> the stack. Sounds like 2.6.18 is good, various kernels around 2.6.23 >> and 2.6.24 are reported bad. Is it possible this was a regression >> introduced by the splice changes? > > I posted this two days ago, but didn't get a reply back regarding if > anyone who can reproduce tested it? > > diff --git a/fs/splice.c b/fs/splice.c <snip patch> I'm sorry we didn't respond any earlier. We've been quite busy dividing our data over multiple fileservers to lower the load on the primary server, and in the process we downgraded the kernels on the NFS-servers to 2.6.22.19. Since then we haven't seen another crash. My gut feeling says that the downgraded kernels were the 'solution', but it could also be that the lowered load has prevented the servers from crashing. At the moment we won't be able to test your patch, simply because we can't afford any more crashes. However, if 2.6.22.19 does crash in the same way in the near future, I'll try your patch. Thanks for your interest and help! Regards, Gertjan Oude Lohuis ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Kernel (2.6.24) crash on nfsd (BUG: soft lockup) [not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org> 2008-02-27 6:46 ` Gertjan Oude Lohuis @ 2008-02-28 11:08 ` Gertjan Oude Lohuis 1 sibling, 0 replies; 8+ messages in thread From: Gertjan Oude Lohuis @ 2008-02-28 11:08 UTC (permalink / raw) To: linux-nfs; +Cc: rune.nilssen-FJFKJQU35qU [-- Attachment #1: Type: text/plain, Size: 840 bytes --] Over the past few days, our fileserver has crashed with this bug a couple of times. We downgraded the kernel to 2.6.23.17 last night, but about an hour ago the machine crashed again, this time with a sligthly different stacktrace (attached). This is driving us nuts: kernel 2.6.17 to 2.6.22 are not possible, because of the lockd issue, and 2.6.23 and 2.6.24 are not possible because it crashes even more often. I noticed that Rune Nilssen reported the same issue a few days ago to this list, but he too hasn't received any response yet (http://article.gmane.org/gmane.linux.nfs/19105). Any more people here that suffer from this issue? Can I get/give more information to make debugging easier? -- Met vriendelijke groet, Gertjan Oude Lohuis Byte Internet W www.byte.nl E support-DW70C6hi67U@public.gmane.org F 020 6255 922 [-- Attachment #2: stacktrace3.txt --] [-- Type: text/plain, Size: 3327 bytes --] BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2643] Pid: 2643, comm: nfsd EIP: 0060:[<c0179a3a>] CPU: 3 EIP is at __generic_file_splice_read+0x12c/0x418 EFLAGS: 00000206 Not tainted (2.6.23.17-fwsh-byte #3) EAX: f6e9dddc EBX: 00001000 ECX: 00000001 EDX: 00000000 ESI: 00000000 EDI: f6e9dcd0 EBP: 00000095 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: b7e72cc0 CR3: 00622000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0113cbb>] __check_preempt_curr_fair+0x4b/0x7d [<c0113dcb>] entity_tick+0x47/0x54 [<c013006b>] getnstimeofday+0x37/0x111 [<c0132fb6>] clockevents_program_event+0xac/0xcc [<c0122996>] run_timer_softirq+0x30/0x184 [<c012f36f>] hrtimer_interrupt+0x132/0x1c4 [<c011f0e0>] __do_softirq+0xba/0xcf [<c010da6a>] smp_apic_timer_interrupt+0x2c/0x35 [<c01032bc>] apic_timer_interrupt+0x28/0x30 [<c0179da7>] generic_file_splice_read+0x81/0xd5 [<c017a6b0>] do_splice_to+0x75/0x97 [<c017a771>] splice_direct_to_actor+0x9f/0x166 [<f8f2a494>] nfsd_acceptable+0x0/0xd1 [nfsd] [<f8f2c247>] nfsd_direct_splice_actor+0x0/0xa [nfsd] [<f8f2c5ea>] nfsd_vfs_read+0x399/0x3bd [nfsd] [<c015d57f>] dentry_open+0x34/0x64 [<f8f2ca1d>] nfsd_read+0xee/0xfb [nfsd] [<f8f332ab>] nfsd3_proc_read+0xfe/0x186 [nfsd] [<f8f34cd9>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] [<f8f28847>] nfsd_dispatch+0xc5/0x1ca [nfsd] [<c043ab82>] svcauth_unix_set_client+0x116/0x165 [<c0436b96>] svc_process+0x4fb/0x6d4 [<c01164ad>] default_wake_function+0x0/0xc [<f8f2863d>] nfsd+0x16a/0x282 [nfsd] [<f8f284d3>] nfsd+0x0/0x282 [nfsd] [<c010343f>] kernel_thread_helper+0x7/0x10 ======================= Pid: 2643, comm: nfsd EIP: 0060:[<c01799ed>] CPU: 3 EIP is at __generic_file_splice_read+0xdf/0x418 EFLAGS: 00000206 Not tainted (2.6.23.17-fwsh-byte #3) EAX: 00000095 EBX: f6e9de50 ECX: 00000001 EDX: 00000000 ESI: 00000001 EDI: f6e9dcd0 EBP: 00000096 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: b7e72cc0 CR3: 00622000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0113cbb>] __check_preempt_curr_fair+0x4b/0x7d [<c0113dcb>] entity_tick+0x47/0x54 [<c013006b>] getnstimeofday+0x37/0x111 [<c0132fb6>] clockevents_program_event+0xac/0xcc [<c0122996>] run_timer_softirq+0x30/0x184 [<c012f36f>] hrtimer_interrupt+0x132/0x1c4 [<c011f0e0>] __do_softirq+0xba/0xcf [<c010da6a>] smp_apic_timer_interrupt+0x2c/0x35 [<c01032bc>] apic_timer_interrupt+0x28/0x30 [<c017007b>] find_inode_fast+0x26/0x46 [<c0179da7>] generic_file_splice_read+0x81/0xd5 [<c017a6b0>] do_splice_to+0x75/0x97 [<c017a771>] splice_direct_to_actor+0x9f/0x166 [<f8f2a494>] nfsd_acceptable+0x0/0xd1 [nfsd] [<f8f2c247>] nfsd_direct_splice_actor+0x0/0xa [nfsd] [<f8f2c5ea>] nfsd_vfs_read+0x399/0x3bd [nfsd] [<c015d57f>] dentry_open+0x34/0x64 [<f8f2ca1d>] nfsd_read+0xee/0xfb [nfsd] [<f8f332ab>] nfsd3_proc_read+0xfe/0x186 [nfsd] [<f8f34cd9>] nfs3svc_decode_readargs+0x0/0xeb [nfsd] [<f8f28847>] nfsd_dispatch+0xc5/0x1ca [nfsd] [<c043ab82>] svcauth_unix_set_client+0x116/0x165 [<c0436b96>] svc_process+0x4fb/0x6d4 [<c01164ad>] default_wake_function+0x0/0xc [<f8f2863d>] nfsd+0x16a/0x282 [nfsd] [<f8f284d3>] nfsd+0x0/0x282 [nfsd] [<c010343f>] kernel_thread_helper+0x7/0x10 ======================= ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-03-05 10:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-26 15:48 Kernel (2.6.24) crash on nfsd (BUG: soft lockup) Gertjan Oude Lohuis
[not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org>
2008-02-27 6:46 ` Gertjan Oude Lohuis
[not found] ` <47C50754.5030107-DW70C6hi67U@public.gmane.org>
2008-02-27 7:01 ` Gertjan Oude Lohuis
[not found] ` <47C50ABB.8050700-DW70C6hi67U@public.gmane.org>
2008-02-28 10:56 ` Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)) Allard Hoeve
[not found] ` <Pine.LNX.4.62.0802281153040.31013-FHjt3+7qfYHBZBx2VKNGNcSTQT6m/s+e@public.gmane.org>
2008-03-01 16:39 ` J. Bruce Fields
2008-03-01 17:03 ` Jens Axboe
2008-03-05 10:25 ` Gertjan Oude Lohuis
2008-02-28 11:08 ` Kernel (2.6.24) crash on nfsd (BUG: soft lockup) Gertjan Oude Lohuis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox