Kernel (2.6.24) crash on nfsd (BUG: soft lockup)

Linux NFS development
 help / color / mirror / Atom feed

* Kernel (2.6.24) crash on nfsd (BUG: soft lockup)
@ 2008-02-26 15:48 Gertjan Oude Lohuis
       [not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Gertjan Oude Lohuis @ 2008-02-26 15:48 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 722 bytes --]

Hi!
One of our fileservers went down pretty hard yesterday. We recently 
upgraded the kernel to 2.6.24 because we suffered from the lockd-lockup 
with our previous kernel (2.6.18).
The server stopped responding completely to any requests (nfs, ssh, 
ping) and every few seconds a stacktrace was dumped on the console. The 
stacktraces hint at nfsd (Pid: 2716, comm: nfsd Not tainted 
(2.6.24.2-fwsh-byte #2) and various nfs-functions in the trace). I 
attached some of them to this message.

Do these stacktraces seem familiar to anyone? I couldn't find any 
similar crashes with google.

-- 

Met vriendelijke groet,

Gertjan Oude Lohuis
Byte Internet

W www.byte.nl
E support-DW70C6hi67U@public.gmane.org
F 020 6255 922

[-- Attachment #2: stacktrace.txt --]
[-- Type: text/plain, Size: 4973 bytes --]

BUG: soft lockup - CPU#0 stuck for 11s! [nfsd:2716]

Pid: 2716, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
EIP is at find_get_pages_contig+0x67/0x73
EAX: 00000000 EBX: 00000001 ECX: c25cc520 EDX: c25cc520
ESI: 00000078 EDI: ca2fbdbc EBP: 00000001 ESP: dffb5c6c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7f5d000 CR3: 1fc45000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
 [<c0113b11>] sched_slice+0x15/0x6f
 [<c0131291>] getnstimeofday+0x31/0x105
 [<c0134301>] clockevents_program_event+0xbf/0x134
 [<c012ef49>] ktime_get_ts+0x15/0x47
 [<c01231ea>] run_timer_softirq+0x30/0x184
 [<c012a893>] __rcu_process_callbacks+0x76/0xbb
 [<c011f979>] tasklet_action+0x53/0x93
 [<c011f754>] __do_softirq+0xba/0xcf
 [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
 [<c01032e0>] apic_timer_interrupt+0x28/0x30
 [<c017c88d>] generic_file_splice_read+0x75/0xc9
 [<c017d083>] do_splice_to+0x6e/0x90
 [<c017d144>] splice_direct_to_actor+0x9f/0x166
 [<f8f2cf72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
 [<c017c818>] generic_file_splice_read+0x0/0xc9
 [<f8f2d309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
 [<f8f2b3b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
 [<c016014f>] dentry_open+0x34/0x64
 [<f8f2d73c>] nfsd_read+0xee/0xfb [nfsd]
 [<f8f33b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
 [<f8f354cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
 [<f8f29855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
 [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
 [<c0441ad1>] svc_process+0x4e9/0x6b4
 [<c01168e2>] default_wake_function+0x0/0x8
 [<f8f2963d>] nfsd+0x16a/0x290 [nfsd]
 [<f8f294d3>] nfsd+0x0/0x290 [nfsd]
 [<c0103463>] kernel_thread_helper+0x7/0x10
 =======================


BUG: soft lockup - CPU#0 stuck for 11s! [nfsd:2716]

Pid: 2716, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
EIP is at find_get_pages_contig+0x67/0x73
EAX: 00000000 EBX: 00000001 ECX: c25cc520 EDX: c25cc520
ESI: 00000078 EDI: ca2fbdbc EBP: 00000001 ESP: dffb5c6c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7f5d000 CR3: 1fc45000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
 [<c0132efc>] clocksource_get_next+0x3a/0x40
 [<c0113b11>] sched_slice+0x15/0x6f
 [<c0131291>] getnstimeofday+0x31/0x105
 [<c0134301>] clockevents_program_event+0xbf/0x134
 [<c012ef49>] ktime_get_ts+0x15/0x47
 [<c01231ea>] run_timer_softirq+0x30/0x184
 [<c012a893>] __rcu_process_callbacks+0x76/0xbb
 [<c011f979>] tasklet_action+0x53/0x93
 [<c011f754>] __do_softirq+0xba/0xcf
 [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
 [<c01032e0>] apic_timer_interrupt+0x28/0x30
 [<c017007b>] locks_show+0x0/0x67
 [<c017c88d>] generic_file_splice_read+0x75/0xc9
 [<c017d083>] do_splice_to+0x6e/0x90
 [<c017d144>] splice_direct_to_actor+0x9f/0x166
 [<f8f2cf72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
 [<c017c818>] generic_file_splice_read+0x0/0xc9
 [<f8f2d309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
 [<f8f2b3b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
 [<c016014f>] dentry_open+0x34/0x64
 [<f8f2d73c>] nfsd_read+0xee/0xfb [nfsd]
 [<f8f33b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
 [<f8f354cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
 [<f8f29855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
 [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
 [<c0441ad1>] svc_process+0x4e9/0x6b4
 [<c01168e2>] default_wake_function+0x0/0x8
 [<f8f2963d>] nfsd+0x16a/0x290 [nfsd]
 [<f8f294d3>] nfsd+0x0/0x290 [nfsd]
 [<c0103463>] kernel_thread_helper+0x7/0x10
 =======================


BUG: soft lockup - CPU#0 stuck for 11s! [nfsd:2716]

Pid: 2716, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c017c88f>] EFLAGS: 00000246 CPU: 0
EIP is at generic_file_splice_read+0x77/0xc9
EAX: 00000000 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00001000 ESP: dffb5df0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7f5d000 CR3: 1fc45000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c017d083>] do_splice_to+0x6e/0x90
 [<c017d144>] splice_direct_to_actor+0x9f/0x166
 [<f8f2cf72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
 [<c017c818>] generic_file_splice_read+0x0/0xc9
 [<f8f2d309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
 [<f8f2b3b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
 [<c016014f>] dentry_open+0x34/0x64
 [<f8f2d73c>] nfsd_read+0xee/0xfb [nfsd]
 [<f8f33b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
 [<f8f354cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
 [<f8f29855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
 [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
 [<c0441ad1>] svc_process+0x4e9/0x6b4
 [<c01168e2>] default_wake_function+0x0/0x8
 [<f8f2963d>] nfsd+0x16a/0x290 [nfsd]
 [<f8f294d3>] nfsd+0x0/0x290 [nfsd]
 [<c0103463>] kernel_thread_helper+0x7/0x10
 ======================= 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)
       [not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org>
@ 2008-02-27  6:46   ` Gertjan Oude Lohuis
       [not found]     ` <47C50754.5030107-DW70C6hi67U@public.gmane.org>
  2008-02-28 11:08   ` Kernel (2.6.24) crash on nfsd (BUG: soft lockup) Gertjan Oude Lohuis
  1 sibling, 1 reply; 8+ messages in thread
From: Gertjan Oude Lohuis @ 2008-02-27  6:46 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

Gertjan Oude Lohuis wrote:
> One of our fileservers went down pretty hard yesterday. We recently 
> upgraded the kernel to 2.6.24 because we suffered from the 
> lockd-lockup with our previous kernel (2.6.18).
> The server stopped responding completely to any requests (nfs, ssh, 
> ping) and every few seconds a stacktrace was dumped on the console. 
> The stacktraces hint at nfsd (Pid: 2716, comm: nfsd Not tainted 
> (2.6.24.2-fwsh-byte #2) and various nfs-functions in the trace). I 
> attached some of them to this message.

This morning the same server crashed again, with the same stacktrace (at 
least to my eyes :-)). I think we'll be downgrading to 2.6.23 as soon as 
possible. Is there anything I can do to get more debug information? Now 
or when it crashes? When the server crashes, I'm able to logging to it 
with the serial console, and reboot it with 'send break -> b'.

Regards,
Gertjan Oude Lohuis


[-- Attachment #2: stacktrace.txt --]
[-- Type: text/plain, Size: 3610 bytes --]

BUG: soft lockup - CPU#2 stuck for 11s! [nfsd:2775]

Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 2
EIP is at find_get_pages_contig+0x67/0x73
EAX: 00000000 EBX: 00000002 ECX: c235c8e0 EDX: c235c8e0
ESI: 00000110 EDI: d7f2c69c EBP: 00000002 ESP: f604fc6c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7fb8000 CR3: 355be000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
 [<c0113b11>] sched_slice+0x15/0x6f
 [<c0131291>] getnstimeofday+0x31/0x105
 [<c0134301>] clockevents_program_event+0xbf/0x134
 [<c012ef49>] ktime_get_ts+0x15/0x47
 [<c01231ea>] run_timer_softirq+0x30/0x184
 [<c012a893>] __rcu_process_callbacks+0x76/0xbb
 [<c011f979>] tasklet_action+0x53/0x93
 [<c011f754>] __do_softirq+0xba/0xcf
 [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
 [<c01032e0>] apic_timer_interrupt+0x28/0x30
 [<c017c88d>] generic_file_splice_read+0x75/0xc9
 [<c017d083>] do_splice_to+0x6e/0x90
 [<c017d144>] splice_direct_to_actor+0x9f/0x166
 [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
 [<c017c818>] generic_file_splice_read+0x0/0xc9
 [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
 [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
 [<c016014f>] dentry_open+0x34/0x64
 [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
 [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
 [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
 [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
 [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
 [<c0441ad1>] svc_process+0x4e9/0x6b4
 [<c01168e2>] default_wake_function+0x0/0x8
 [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
 [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
 [<c0103463>] kernel_thread_helper+0x7/0x10
 =======================

BUG: soft lockup - CPU#2 stuck for 11s! [nfsd:2775]

Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 2
EIP is at find_get_pages_contig+0x67/0x73
EAX: 00000000 EBX: 00000002 ECX: c235c8e0 EDX: c235c8e0
ESI: 00000110 EDI: d7f2c69c EBP: 00000002 ESP: f604fc6c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7fb8000 CR3: 355be000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
 [<c0113b11>] sched_slice+0x15/0x6f
 [<c0131291>] getnstimeofday+0x31/0x105
 [<c0134301>] clockevents_program_event+0xbf/0x134
 [<c012ef49>] ktime_get_ts+0x15/0x47
 [<c01231ea>] run_timer_softirq+0x30/0x184
 [<c012a893>] __rcu_process_callbacks+0x76/0xbb
 [<c011f979>] tasklet_action+0x53/0x93
 [<c011f754>] __do_softirq+0xba/0xcf
 [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
 [<c01032e0>] apic_timer_interrupt+0x28/0x30
 [<c017c88d>] generic_file_splice_read+0x75/0xc9
 [<c017d083>] do_splice_to+0x6e/0x90
 [<c017d144>] splice_direct_to_actor+0x9f/0x166
 [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
 [<c017c818>] generic_file_splice_read+0x0/0xc9
 [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
 [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
 [<c016014f>] dentry_open+0x34/0x64
 [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
 [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
 [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
 [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
 [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
 [<c0441ad1>] svc_process+0x4e9/0x6b4
 [<c01168e2>] default_wake_function+0x0/0x8
 [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
 [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
 [<c0103463>] kernel_thread_helper+0x7/0x10
 =======================


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)
       [not found]     ` <47C50754.5030107-DW70C6hi67U@public.gmane.org>
@ 2008-02-27  7:01       ` Gertjan Oude Lohuis
       [not found]         ` <47C50ABB.8050700-DW70C6hi67U@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Gertjan Oude Lohuis @ 2008-02-27  7:01 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

Gertjan Oude Lohuis wrote:
> This morning the same server crashed again, with the same stacktrace 
> (at least to my eyes :-)). I think we'll be downgrading to 2.6.23 as 
> soon as possible. Is there anything I can do to get more debug 
> information? Now or when it crashes? When the server crashes, I'm able 
> to logging to it with the serial console, and reboot it with 'send 
> break -> b'.

This keeps getting weirder. When browsing the servers logfiles, I 
noticed that the server has exactly the same errors in /var/log/messages 
yesterday night, around 1:52 AM. However, the server did not crash then. 
We didn't notice earlier, because most notifications are suppressed 
during the night. Apparently, Linux can recover from this bug, given 
enough time.
What expert can help me understand this problem?

Regards,
Gertjan Oude Lohuis

[-- Attachment #2: stacktrace2.txt --]
[-- Type: text/plain, Size: 12059 bytes --]

Feb 26 01:52:00 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775]
Feb 26 01:52:00 file1 kernel:
Feb 26 01:52:00 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
Feb 26 01:52:00 file1 kernel: EIP: 0060:[<c0147be0>] EFLAGS: 00000246 CPU: 3
Feb 26 01:52:00 file1 kernel: EIP is at put_page+0x9/0x20
Feb 26 01:52:00 file1 kernel: EAX: 80000008 EBX: 00000000 ECX: 00000002 EDX: c2a71240
Feb 26 01:52:00 file1 kernel: ESI: 00000000 EDI: e6ee08fc EBP: 00000087 ESP: f604fc7c
Feb 26 01:52:00 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Feb 26 01:52:00 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0
Feb 26 01:52:00 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 26 01:52:00 file1 kernel: DR6: ffff0ff0 DR7: 00000400
Feb 26 01:52:00 file1 kernel: [<c017c6bc>] __generic_file_splice_read+0x2c2/0x41e
Feb 26 01:52:00 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f
Feb 26 01:52:00 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105
Feb 26 01:52:01 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Feb 26 01:52:01 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Feb 26 01:52:01 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Feb 26 01:52:01 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Feb 26 01:52:01 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93
Feb 26 01:52:01 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf
Feb 26 01:52:01 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
Feb 26 01:52:01 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30
Feb 26 01:52:01 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Feb 26 01:52:01 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90
Feb 26 01:52:01 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Feb 26 01:52:01 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
Feb 26 01:52:01 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9
Feb 26 01:52:01 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
Feb 26 01:52:01 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
Feb 26 01:52:01 file1 kernel: [<c016014f>] dentry_open+0x34/0x64
Feb 26 01:52:01 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
Feb 26 01:52:01 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
Feb 26 01:52:01 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
Feb 26 01:52:01 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
Feb 26 01:52:01 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
Feb 26 01:52:02 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4
Feb 26 01:52:02 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8
Feb 26 01:52:02 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
Feb 26 01:52:02 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
Feb 26 01:52:02 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10
Feb 26 01:52:02 file1 kernel: =======================
Feb 26 01:52:14 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775]
Feb 26 01:52:14 file1 kernel:
Feb 26 01:52:14 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
Feb 26 01:52:14 file1 kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 3
Feb 26 01:52:14 file1 kernel: EIP is at find_get_pages_contig+0x67/0x73
Feb 26 01:52:14 file1 kernel: EAX: 00000000 EBX: 00000002 ECX: c2a71260 EDX: c2a71260
Feb 26 01:52:14 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc6c
Feb 26 01:52:14 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Feb 26 01:52:14 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0
Feb 26 01:52:14 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 26 01:52:14 file1 kernel: DR6: ffff0ff0 DR7: 00000400
Feb 26 01:52:14 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Feb 26 01:52:14 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f
Feb 26 01:52:14 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105
Feb 26 01:52:14 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Feb 26 01:52:14 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Feb 26 01:52:14 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Feb 26 01:52:14 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Feb 26 01:52:14 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93
Feb 26 01:52:14 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf
Feb 26 01:52:14 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
Feb 26 01:52:15 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30
Feb 26 01:52:15 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Feb 26 01:52:15 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90
Feb 26 01:52:15 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Feb 26 01:52:15 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
Feb 26 01:52:15 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9
Feb 26 01:52:15 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
Feb 26 01:52:15 file1 kernel: [<c016014f>] dentry_open+0x34/0x64
Feb 26 01:52:15 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
Feb 26 01:52:15 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
Feb 26 01:52:15 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4
Feb 26 01:52:15 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8
Feb 26 01:52:15 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
Feb 26 01:52:15 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10
Feb 26 01:52:15 file1 kernel: =======================
Feb 26 01:52:27 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775]
Feb 26 01:52:27 file1 kernel:
Feb 26 01:52:27 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
Feb 26 01:52:27 file1 kernel: EIP: 0060:[<c014096a>] EFLAGS: 00000286 CPU: 3
Feb 26 01:52:27 file1 kernel: EIP is at find_get_pages_contig+0x6a/0x73
Feb 26 01:52:27 file1 kernel: EAX: 00000002 EBX: 00000002 ECX: c2a71260 EDX: c2a71260
Feb 26 01:52:28 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc70
Feb 26 01:52:28 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Feb 26 01:52:28 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0
Feb 26 01:52:28 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 26 01:52:28 file1 kernel: DR6: ffff0ff0 DR7: 00000400
Feb 26 01:52:28 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Feb 26 01:52:28 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f
Feb 26 01:52:28 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105
Feb 26 01:52:28 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Feb 26 01:52:28 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Feb 26 01:52:28 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Feb 26 01:52:28 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Feb 26 01:52:28 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93
Feb 26 01:52:28 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf
Feb 26 01:52:28 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
Feb 26 01:52:28 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30
Feb 26 01:52:28 file1 kernel: [<c01700d8>] locks_show+0x5d/0x67
Feb 26 01:52:28 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Feb 26 01:52:28 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90
Feb 26 01:52:28 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Feb 26 01:52:29 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
Feb 26 01:52:29 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9
Feb 26 01:52:29 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
Feb 26 01:52:29 file1 kernel: [<c016014f>] dentry_open+0x34/0x64
Feb 26 01:52:29 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
Feb 26 01:52:29 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
Feb 26 01:52:29 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4
Feb 26 01:52:29 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8
Feb 26 01:52:29 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
Feb 26 01:52:29 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10
Feb 26 01:52:29 file1 kernel: =======================
Feb 26 01:52:41 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775]
Feb 26 01:52:41 file1 kernel:
Feb 26 01:52:41 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
Feb 26 01:52:41 file1 kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 3
Feb 26 01:52:41 file1 kernel: EIP is at find_get_pages_contig+0x67/0x73
Feb 26 01:52:41 file1 kernel: EAX: 00000000 EBX: 00000002 ECX: c2a71260 EDX: c2a71260
Feb 26 01:52:41 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc6c
Feb 26 01:52:41 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Feb 26 01:52:41 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0
Feb 26 01:52:41 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 26 01:52:42 file1 kernel: DR6: ffff0ff0 DR7: 00000400
Feb 26 01:52:42 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Feb 26 01:52:42 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f
Feb 26 01:52:42 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105
Feb 26 01:52:42 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Feb 26 01:52:42 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Feb 26 01:52:42 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Feb 26 01:52:42 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Feb 26 01:52:42 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93
Feb 26 01:52:42 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf
Feb 26 01:52:42 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
Feb 26 01:52:42 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30
Feb 26 01:52:42 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Feb 26 01:52:42 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90
Feb 26 01:52:42 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Feb 26 01:52:42 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
Feb 26 01:52:42 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9
Feb 26 01:52:42 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
Feb 26 01:52:42 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
Feb 26 01:52:42 file1 kernel: [<c016014f>] dentry_open+0x34/0x64
Feb 26 01:52:43 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
Feb 26 01:52:43 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
Feb 26 01:52:43 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
Feb 26 01:52:43 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
Feb 26 01:52:43 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
Feb 26 01:52:43 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4
Feb 26 01:52:43 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8
Feb 26 01:52:43 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
Feb 26 01:52:43 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
Feb 26 01:52:43 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10
Feb 26 01:52:43 file1 kernel: =======================

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup))
       [not found]         ` <47C50ABB.8050700-DW70C6hi67U@public.gmane.org>
@ 2008-02-28 10:56           ` Allard Hoeve
       [not found]             ` <Pine.LNX.4.62.0802281153040.31013-FHjt3+7qfYHBZBx2VKNGNcSTQT6m/s+e@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Allard Hoeve @ 2008-02-28 10:56 UTC (permalink / raw)
  To: linux-nfs


Dear Mailinglist,

After trying 2.6.23.17, the same happened. The stacktrace is a bit 
different, but they are comparable.

Is this an NFS problem in the first place? Where could we go for help with 
this problem?

Regards,

Allard Hoeve



Pid: 2643, comm:                 nfsd
EIP: 0060:[<c0179a3a>] CPU: 3
EIP is at __generic_file_splice_read+0x12c/0x418
  EFLAGS: 00000206    Not tainted  (2.6.23.17-fwsh-byte #3)
EAX: f6e9dddc EBX: 00001000 ECX: 00000001 EDX: 00000000
ESI: 00000000 EDI: f6e9dcd0 EBP: 00000095 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: b7e72cc0 CR3: 00622000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
  [<c0113cbb>] __check_preempt_curr_fair+0x4b/0x7d
  [<c0113dcb>] entity_tick+0x47/0x54
  [<c013006b>] getnstimeofday+0x37/0x111
  [<c0132fb6>] clockevents_program_event+0xac/0xcc
  [<c0122996>] run_timer_softirq+0x30/0x184
  [<c012f36f>] hrtimer_interrupt+0x132/0x1c4
  [<c011f0e0>] __do_softirq+0xba/0xcf
  [<c010da6a>] smp_apic_timer_interrupt+0x2c/0x35
  [<c01032bc>] apic_timer_interrupt+0x28/0x30
  [<c0179da7>] generic_file_splice_read+0x81/0xd5
  [<c017a6b0>] do_splice_to+0x75/0x97
  [<c017a771>] splice_direct_to_actor+0x9f/0x166
  [<f8f2a494>] nfsd_acceptable+0x0/0xd1 [nfsd]
  [<f8f2c247>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
  [<f8f2c5ea>] nfsd_vfs_read+0x399/0x3bd [nfsd]
  [<c015d57f>] dentry_open+0x34/0x64
  [<f8f2ca1d>] nfsd_read+0xee/0xfb [nfsd]
  [<f8f332ab>] nfsd3_proc_read+0xfe/0x186 [nfsd]
  [<f8f34cd9>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
  [<f8f28847>] nfsd_dispatch+0xc5/0x1ca [nfsd]
  [<c043ab82>] svcauth_unix_set_client+0x116/0x165
  [<c0436b96>] svc_process+0x4fb/0x6d4
  [<c01164ad>] default_wake_function+0x0/0xc
  [<f8f2863d>] nfsd+0x16a/0x282 [nfsd]
  [<f8f284d3>] nfsd+0x0/0x282 [nfsd]
  [<c010343f>] kernel_thread_helper+0x7/0x10


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)
       [not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org>
  2008-02-27  6:46   ` Gertjan Oude Lohuis
@ 2008-02-28 11:08   ` Gertjan Oude Lohuis
  1 sibling, 0 replies; 8+ messages in thread
From: Gertjan Oude Lohuis @ 2008-02-28 11:08 UTC (permalink / raw)
  To: linux-nfs; +Cc: rune.nilssen-FJFKJQU35qU

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

Over the past few days, our fileserver has crashed with this bug a 
couple of times. We downgraded the kernel to 2.6.23.17 last night, but 
about an hour ago the machine crashed again, this time with a sligthly 
different stacktrace (attached).
This is driving us nuts: kernel 2.6.17 to 2.6.22 are not possible, 
because of the lockd issue, and 2.6.23 and 2.6.24 are not possible 
because it crashes even more often.

I noticed that Rune Nilssen reported the same issue a few days ago to 
this list, but he too hasn't received any response yet 
(http://article.gmane.org/gmane.linux.nfs/19105).

Any more people here that suffer from this issue? Can I get/give more 
information to make debugging easier?


-- 

Met vriendelijke groet,

Gertjan Oude Lohuis
Byte Internet

W www.byte.nl
E support-DW70C6hi67U@public.gmane.org
F 020 6255 922

[-- Attachment #2: stacktrace3.txt --]
[-- Type: text/plain, Size: 3327 bytes --]

BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2643]

Pid: 2643, comm:                 nfsd
EIP: 0060:[<c0179a3a>] CPU: 3
EIP is at __generic_file_splice_read+0x12c/0x418
 EFLAGS: 00000206    Not tainted  (2.6.23.17-fwsh-byte #3)
EAX: f6e9dddc EBX: 00001000 ECX: 00000001 EDX: 00000000
ESI: 00000000 EDI: f6e9dcd0 EBP: 00000095 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: b7e72cc0 CR3: 00622000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c0113cbb>] __check_preempt_curr_fair+0x4b/0x7d
 [<c0113dcb>] entity_tick+0x47/0x54
 [<c013006b>] getnstimeofday+0x37/0x111
 [<c0132fb6>] clockevents_program_event+0xac/0xcc
 [<c0122996>] run_timer_softirq+0x30/0x184
 [<c012f36f>] hrtimer_interrupt+0x132/0x1c4
 [<c011f0e0>] __do_softirq+0xba/0xcf
 [<c010da6a>] smp_apic_timer_interrupt+0x2c/0x35
 [<c01032bc>] apic_timer_interrupt+0x28/0x30
 [<c0179da7>] generic_file_splice_read+0x81/0xd5
 [<c017a6b0>] do_splice_to+0x75/0x97
 [<c017a771>] splice_direct_to_actor+0x9f/0x166
 [<f8f2a494>] nfsd_acceptable+0x0/0xd1 [nfsd]
 [<f8f2c247>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
 [<f8f2c5ea>] nfsd_vfs_read+0x399/0x3bd [nfsd]
 [<c015d57f>] dentry_open+0x34/0x64
 [<f8f2ca1d>] nfsd_read+0xee/0xfb [nfsd]
 [<f8f332ab>] nfsd3_proc_read+0xfe/0x186 [nfsd]
 [<f8f34cd9>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
 [<f8f28847>] nfsd_dispatch+0xc5/0x1ca [nfsd]
 [<c043ab82>] svcauth_unix_set_client+0x116/0x165
 [<c0436b96>] svc_process+0x4fb/0x6d4
 [<c01164ad>] default_wake_function+0x0/0xc
 [<f8f2863d>] nfsd+0x16a/0x282 [nfsd]
 [<f8f284d3>] nfsd+0x0/0x282 [nfsd]
 [<c010343f>] kernel_thread_helper+0x7/0x10
 =======================


Pid: 2643, comm:                 nfsd
EIP: 0060:[<c01799ed>] CPU: 3
EIP is at __generic_file_splice_read+0xdf/0x418
 EFLAGS: 00000206    Not tainted  (2.6.23.17-fwsh-byte #3)
EAX: 00000095 EBX: f6e9de50 ECX: 00000001 EDX: 00000000
ESI: 00000001 EDI: f6e9dcd0 EBP: 00000096 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: b7e72cc0 CR3: 00622000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c0113cbb>] __check_preempt_curr_fair+0x4b/0x7d
 [<c0113dcb>] entity_tick+0x47/0x54
 [<c013006b>] getnstimeofday+0x37/0x111
 [<c0132fb6>] clockevents_program_event+0xac/0xcc
 [<c0122996>] run_timer_softirq+0x30/0x184
 [<c012f36f>] hrtimer_interrupt+0x132/0x1c4
 [<c011f0e0>] __do_softirq+0xba/0xcf
 [<c010da6a>] smp_apic_timer_interrupt+0x2c/0x35
 [<c01032bc>] apic_timer_interrupt+0x28/0x30
 [<c017007b>] find_inode_fast+0x26/0x46
 [<c0179da7>] generic_file_splice_read+0x81/0xd5
 [<c017a6b0>] do_splice_to+0x75/0x97
 [<c017a771>] splice_direct_to_actor+0x9f/0x166
 [<f8f2a494>] nfsd_acceptable+0x0/0xd1 [nfsd]
 [<f8f2c247>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
 [<f8f2c5ea>] nfsd_vfs_read+0x399/0x3bd [nfsd]
 [<c015d57f>] dentry_open+0x34/0x64
 [<f8f2ca1d>] nfsd_read+0xee/0xfb [nfsd]
 [<f8f332ab>] nfsd3_proc_read+0xfe/0x186 [nfsd]
 [<f8f34cd9>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
 [<f8f28847>] nfsd_dispatch+0xc5/0x1ca [nfsd]
 [<c043ab82>] svcauth_unix_set_client+0x116/0x165
 [<c0436b96>] svc_process+0x4fb/0x6d4
 [<c01164ad>] default_wake_function+0x0/0xc
 [<f8f2863d>] nfsd+0x16a/0x282 [nfsd]
 [<f8f284d3>] nfsd+0x0/0x282 [nfsd]
 [<c010343f>] kernel_thread_helper+0x7/0x10
 =======================


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup))
       [not found]             ` <Pine.LNX.4.62.0802281153040.31013-FHjt3+7qfYHBZBx2VKNGNcSTQT6m/s+e@public.gmane.org>
@ 2008-03-01 16:39               ` J. Bruce Fields
  2008-03-01 17:03                 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2008-03-01 16:39 UTC (permalink / raw)
  To: Allard Hoeve; +Cc: linux-nfs, Nilssen, Rune, Gertjan Oude Lohuis, Jens Axboe

On Thu, Feb 28, 2008 at 11:56:51AM +0100, Allard Hoeve wrote:
> After trying 2.6.23.17, the same happened. The stacktrace is a bit  
> different, but they are comparable.
>
> Is this an NFS problem in the first place? Where could we go for help 
> with this problem?

Thanks for the reports!

So, the summary: several people are reporting soft lockup warnings with
_generic_file_splice_read as the latest or next-to-latest function on
the stack.  Sounds like 2.6.18 is good, various kernels around 2.6.23
and 2.6.24 are reported bad.  Is it possible this was a regression
introduced by the splice changes?

--b.

>
> Regards,
>
> Allard Hoeve
>
>
>
> Pid: 2643, comm:                 nfsd
> EIP: 0060:[<c0179a3a>] CPU: 3
> EIP is at __generic_file_splice_read+0x12c/0x418
>  EFLAGS: 00000206    Not tainted  (2.6.23.17-fwsh-byte #3)
> EAX: f6e9dddc EBX: 00001000 ECX: 00000001 EDX: 00000000
> ESI: 00000000 EDI: f6e9dcd0 EBP: 00000095 DS: 007b ES: 007b FS: 00d8
> CR0: 8005003b CR2: b7e72cc0 CR3: 00622000 CR4: 000006f0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
>  [<c0113cbb>] __check_preempt_curr_fair+0x4b/0x7d
>  [<c0113dcb>] entity_tick+0x47/0x54
>  [<c013006b>] getnstimeofday+0x37/0x111
>  [<c0132fb6>] clockevents_program_event+0xac/0xcc
>  [<c0122996>] run_timer_softirq+0x30/0x184
>  [<c012f36f>] hrtimer_interrupt+0x132/0x1c4
>  [<c011f0e0>] __do_softirq+0xba/0xcf
>  [<c010da6a>] smp_apic_timer_interrupt+0x2c/0x35
>  [<c01032bc>] apic_timer_interrupt+0x28/0x30
>  [<c0179da7>] generic_file_splice_read+0x81/0xd5
>  [<c017a6b0>] do_splice_to+0x75/0x97
>  [<c017a771>] splice_direct_to_actor+0x9f/0x166
>  [<f8f2a494>] nfsd_acceptable+0x0/0xd1 [nfsd]
>  [<f8f2c247>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
>  [<f8f2c5ea>] nfsd_vfs_read+0x399/0x3bd [nfsd]
>  [<c015d57f>] dentry_open+0x34/0x64
>  [<f8f2ca1d>] nfsd_read+0xee/0xfb [nfsd]
>  [<f8f332ab>] nfsd3_proc_read+0xfe/0x186 [nfsd]
>  [<f8f34cd9>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
>  [<f8f28847>] nfsd_dispatch+0xc5/0x1ca [nfsd]
>  [<c043ab82>] svcauth_unix_set_client+0x116/0x165
>  [<c0436b96>] svc_process+0x4fb/0x6d4
>  [<c01164ad>] default_wake_function+0x0/0xc
>  [<f8f2863d>] nfsd+0x16a/0x282 [nfsd]
>  [<f8f284d3>] nfsd+0x0/0x282 [nfsd]
>  [<c010343f>] kernel_thread_helper+0x7/0x10
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd  (BUG: soft lockup))
  2008-03-01 16:39               ` J. Bruce Fields
@ 2008-03-01 17:03                 ` Jens Axboe
  2008-03-05 10:25                   ` Gertjan Oude Lohuis
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2008-03-01 17:03 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Allard Hoeve, linux-nfs, Nilssen, Rune, Gertjan Oude Lohuis

On Sat, Mar 01 2008, J. Bruce Fields wrote:
> On Thu, Feb 28, 2008 at 11:56:51AM +0100, Allard Hoeve wrote:
> > After trying 2.6.23.17, the same happened. The stacktrace is a bit  
> > different, but they are comparable.
> >
> > Is this an NFS problem in the first place? Where could we go for help 
> > with this problem?
> 
> Thanks for the reports!
> 
> So, the summary: several people are reporting soft lockup warnings with
> _generic_file_splice_read as the latest or next-to-latest function on
> the stack.  Sounds like 2.6.18 is good, various kernels around 2.6.23
> and 2.6.24 are reported bad.  Is it possible this was a regression
> introduced by the splice changes?

I posted this two days ago, but didn't get a reply back regarding if
anyone who can reproduce tested it?

diff --git a/fs/splice.c b/fs/splice.c
index 9b559ee..0254ec6 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -370,8 +370,10 @@ __generic_file_splice_read(struct file *in, loff_t *ppos,
 			 * for an in-flight io page
 			 */
 			if (flags & SPLICE_F_NONBLOCK) {
-				if (TestSetPageLocked(page))
+				if (TestSetPageLocked(page)) {
+					error = -EAGAIN;
 					break;
+				}
 			} else
 				lock_page(page);
 
@@ -479,9 +481,8 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
 				 struct pipe_inode_info *pipe, size_t len,
 				 unsigned int flags)
 {
-	ssize_t spliced;
-	int ret;
 	loff_t isize, left;
+	int ret;
 
 	isize = i_size_read(in->f_mapping->host);
 	if (unlikely(*ppos >= isize))
@@ -491,29 +492,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
 	if (unlikely(left < len))
 		len = left;
 
-	ret = 0;
-	spliced = 0;
-	while (len && !spliced) {
-		ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
-
-		if (ret < 0)
-			break;
-		else if (!ret) {
-			if (spliced)
-				break;
-			if (flags & SPLICE_F_NONBLOCK) {
-				ret = -EAGAIN;
-				break;
-			}
-		}
-
+	ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
+	if (ret > 0)
 		*ppos += ret;
-		len -= ret;
-		spliced += ret;
-	}
-
-	if (spliced)
-		return spliced;
 
 	return ret;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd  (BUG: soft lockup))
  2008-03-01 17:03                 ` Jens Axboe
@ 2008-03-05 10:25                   ` Gertjan Oude Lohuis
  0 siblings, 0 replies; 8+ messages in thread
From: Gertjan Oude Lohuis @ 2008-03-05 10:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: J. Bruce Fields, Allard Hoeve, linux-nfs, Nilssen, Rune

Hi Jens et al,

On 03/01/2008 06:03 PM, Jens Axboe wrote:
> On Sat, Mar 01 2008, J. Bruce Fields wrote:
>> So, the summary: several people are reporting soft lockup warnings with
>> _generic_file_splice_read as the latest or next-to-latest function on
>> the stack.  Sounds like 2.6.18 is good, various kernels around 2.6.23
>> and 2.6.24 are reported bad.  Is it possible this was a regression
>> introduced by the splice changes?
> 
> I posted this two days ago, but didn't get a reply back regarding if
> anyone who can reproduce tested it?
> 
> diff --git a/fs/splice.c b/fs/splice.c

<snip patch>

I'm sorry we didn't respond any earlier. We've been quite busy dividing 
our data over multiple fileservers to lower the load on the primary 
server, and in the process we downgraded the kernels on the NFS-servers 
to 2.6.22.19.
Since then we haven't seen another crash. My gut feeling says that the 
downgraded kernels were the 'solution', but it could also be that the 
lowered load has prevented the servers from crashing.

At the moment we won't be able to test your patch, simply because we 
can't afford any more crashes. However, if 2.6.22.19 does crash in the 
same way in the near future, I'll try your patch.

Thanks for your interest and help!

Regards,
Gertjan Oude Lohuis

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-03-05 10:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-26 15:48 Kernel (2.6.24) crash on nfsd (BUG: soft lockup) Gertjan Oude Lohuis
     [not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org>
2008-02-27  6:46   ` Gertjan Oude Lohuis
     [not found]     ` <47C50754.5030107-DW70C6hi67U@public.gmane.org>
2008-02-27  7:01       ` Gertjan Oude Lohuis
     [not found]         ` <47C50ABB.8050700-DW70C6hi67U@public.gmane.org>
2008-02-28 10:56           ` Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)) Allard Hoeve
     [not found]             ` <Pine.LNX.4.62.0802281153040.31013-FHjt3+7qfYHBZBx2VKNGNcSTQT6m/s+e@public.gmane.org>
2008-03-01 16:39               ` J. Bruce Fields
2008-03-01 17:03                 ` Jens Axboe
2008-03-05 10:25                   ` Gertjan Oude Lohuis
2008-02-28 11:08   ` Kernel (2.6.24) crash on nfsd (BUG: soft lockup) Gertjan Oude Lohuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox