Linux NFS development
 help / color / mirror / Atom feed
From: Gertjan Oude Lohuis <gertjan-DW70C6hi67U@public.gmane.org>
To: linux-nfs@vger.kernel.org
Subject: Re: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)
Date: Wed, 27 Feb 2008 08:01:15 +0100	[thread overview]
Message-ID: <47C50ABB.8050700@byte.nl> (raw)
In-Reply-To: <47C50754.5030107-DW70C6hi67U@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

Gertjan Oude Lohuis wrote:
> This morning the same server crashed again, with the same stacktrace 
> (at least to my eyes :-)). I think we'll be downgrading to 2.6.23 as 
> soon as possible. Is there anything I can do to get more debug 
> information? Now or when it crashes? When the server crashes, I'm able 
> to logging to it with the serial console, and reboot it with 'send 
> break -> b'.

This keeps getting weirder. When browsing the servers logfiles, I 
noticed that the server has exactly the same errors in /var/log/messages 
yesterday night, around 1:52 AM. However, the server did not crash then. 
We didn't notice earlier, because most notifications are suppressed 
during the night. Apparently, Linux can recover from this bug, given 
enough time.
What expert can help me understand this problem?

Regards,
Gertjan Oude Lohuis

[-- Attachment #2: stacktrace2.txt --]
[-- Type: text/plain, Size: 12059 bytes --]

Feb 26 01:52:00 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775]
Feb 26 01:52:00 file1 kernel:
Feb 26 01:52:00 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
Feb 26 01:52:00 file1 kernel: EIP: 0060:[<c0147be0>] EFLAGS: 00000246 CPU: 3
Feb 26 01:52:00 file1 kernel: EIP is at put_page+0x9/0x20
Feb 26 01:52:00 file1 kernel: EAX: 80000008 EBX: 00000000 ECX: 00000002 EDX: c2a71240
Feb 26 01:52:00 file1 kernel: ESI: 00000000 EDI: e6ee08fc EBP: 00000087 ESP: f604fc7c
Feb 26 01:52:00 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Feb 26 01:52:00 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0
Feb 26 01:52:00 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 26 01:52:00 file1 kernel: DR6: ffff0ff0 DR7: 00000400
Feb 26 01:52:00 file1 kernel: [<c017c6bc>] __generic_file_splice_read+0x2c2/0x41e
Feb 26 01:52:00 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f
Feb 26 01:52:00 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105
Feb 26 01:52:01 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Feb 26 01:52:01 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Feb 26 01:52:01 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Feb 26 01:52:01 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Feb 26 01:52:01 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93
Feb 26 01:52:01 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf
Feb 26 01:52:01 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
Feb 26 01:52:01 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30
Feb 26 01:52:01 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Feb 26 01:52:01 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90
Feb 26 01:52:01 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Feb 26 01:52:01 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
Feb 26 01:52:01 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9
Feb 26 01:52:01 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
Feb 26 01:52:01 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
Feb 26 01:52:01 file1 kernel: [<c016014f>] dentry_open+0x34/0x64
Feb 26 01:52:01 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
Feb 26 01:52:01 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
Feb 26 01:52:01 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
Feb 26 01:52:01 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
Feb 26 01:52:01 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
Feb 26 01:52:02 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4
Feb 26 01:52:02 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8
Feb 26 01:52:02 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
Feb 26 01:52:02 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
Feb 26 01:52:02 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10
Feb 26 01:52:02 file1 kernel: =======================
Feb 26 01:52:14 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775]
Feb 26 01:52:14 file1 kernel:
Feb 26 01:52:14 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
Feb 26 01:52:14 file1 kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 3
Feb 26 01:52:14 file1 kernel: EIP is at find_get_pages_contig+0x67/0x73
Feb 26 01:52:14 file1 kernel: EAX: 00000000 EBX: 00000002 ECX: c2a71260 EDX: c2a71260
Feb 26 01:52:14 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc6c
Feb 26 01:52:14 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Feb 26 01:52:14 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0
Feb 26 01:52:14 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 26 01:52:14 file1 kernel: DR6: ffff0ff0 DR7: 00000400
Feb 26 01:52:14 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Feb 26 01:52:14 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f
Feb 26 01:52:14 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105
Feb 26 01:52:14 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Feb 26 01:52:14 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Feb 26 01:52:14 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Feb 26 01:52:14 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Feb 26 01:52:14 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93
Feb 26 01:52:14 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf
Feb 26 01:52:14 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
Feb 26 01:52:15 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30
Feb 26 01:52:15 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Feb 26 01:52:15 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90
Feb 26 01:52:15 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Feb 26 01:52:15 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
Feb 26 01:52:15 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9
Feb 26 01:52:15 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
Feb 26 01:52:15 file1 kernel: [<c016014f>] dentry_open+0x34/0x64
Feb 26 01:52:15 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
Feb 26 01:52:15 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
Feb 26 01:52:15 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4
Feb 26 01:52:15 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8
Feb 26 01:52:15 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
Feb 26 01:52:15 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
Feb 26 01:52:15 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10
Feb 26 01:52:15 file1 kernel: =======================
Feb 26 01:52:27 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775]
Feb 26 01:52:27 file1 kernel:
Feb 26 01:52:27 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
Feb 26 01:52:27 file1 kernel: EIP: 0060:[<c014096a>] EFLAGS: 00000286 CPU: 3
Feb 26 01:52:27 file1 kernel: EIP is at find_get_pages_contig+0x6a/0x73
Feb 26 01:52:27 file1 kernel: EAX: 00000002 EBX: 00000002 ECX: c2a71260 EDX: c2a71260
Feb 26 01:52:28 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc70
Feb 26 01:52:28 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Feb 26 01:52:28 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0
Feb 26 01:52:28 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 26 01:52:28 file1 kernel: DR6: ffff0ff0 DR7: 00000400
Feb 26 01:52:28 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Feb 26 01:52:28 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f
Feb 26 01:52:28 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105
Feb 26 01:52:28 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Feb 26 01:52:28 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Feb 26 01:52:28 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Feb 26 01:52:28 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Feb 26 01:52:28 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93
Feb 26 01:52:28 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf
Feb 26 01:52:28 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
Feb 26 01:52:28 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30
Feb 26 01:52:28 file1 kernel: [<c01700d8>] locks_show+0x5d/0x67
Feb 26 01:52:28 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Feb 26 01:52:28 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90
Feb 26 01:52:28 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Feb 26 01:52:29 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
Feb 26 01:52:29 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9
Feb 26 01:52:29 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
Feb 26 01:52:29 file1 kernel: [<c016014f>] dentry_open+0x34/0x64
Feb 26 01:52:29 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
Feb 26 01:52:29 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
Feb 26 01:52:29 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4
Feb 26 01:52:29 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8
Feb 26 01:52:29 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
Feb 26 01:52:29 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
Feb 26 01:52:29 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10
Feb 26 01:52:29 file1 kernel: =======================
Feb 26 01:52:41 file1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [nfsd:2775]
Feb 26 01:52:41 file1 kernel:
Feb 26 01:52:41 file1 kernel: Pid: 2775, comm: nfsd Not tainted (2.6.24.2-fwsh-byte #2)
Feb 26 01:52:41 file1 kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 3
Feb 26 01:52:41 file1 kernel: EIP is at find_get_pages_contig+0x67/0x73
Feb 26 01:52:41 file1 kernel: EAX: 00000000 EBX: 00000002 ECX: c2a71260 EDX: c2a71260
Feb 26 01:52:41 file1 kernel: ESI: 00000089 EDI: e6ee09ac EBP: 00000002 ESP: f604fc6c
Feb 26 01:52:41 file1 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Feb 26 01:52:41 file1 kernel: CR0: 8005003b CR2: 080a7070 CR3: 36cbd000 CR4: 000006f0
Feb 26 01:52:41 file1 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb 26 01:52:42 file1 kernel: DR6: ffff0ff0 DR7: 00000400
Feb 26 01:52:42 file1 kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Feb 26 01:52:42 file1 kernel: [<c0113b11>] sched_slice+0x15/0x6f
Feb 26 01:52:42 file1 kernel: [<c0131291>] getnstimeofday+0x31/0x105
Feb 26 01:52:42 file1 kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Feb 26 01:52:42 file1 kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Feb 26 01:52:42 file1 kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Feb 26 01:52:42 file1 kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Feb 26 01:52:42 file1 kernel: [<c011f979>] tasklet_action+0x53/0x93
Feb 26 01:52:42 file1 kernel: [<c011f754>] __do_softirq+0xba/0xcf
Feb 26 01:52:42 file1 kernel: [<c010e20e>] smp_apic_timer_interrupt+0x2c/0x35
Feb 26 01:52:42 file1 kernel: [<c01032e0>] apic_timer_interrupt+0x28/0x30
Feb 26 01:52:42 file1 kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Feb 26 01:52:42 file1 kernel: [<c017d083>] do_splice_to+0x6e/0x90
Feb 26 01:52:42 file1 kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Feb 26 01:52:42 file1 kernel: [<f8f32f72>] nfsd_direct_splice_actor+0x0/0xa [nfsd]
Feb 26 01:52:42 file1 kernel: [<c017c818>] generic_file_splice_read+0x0/0xc9
Feb 26 01:52:42 file1 kernel: [<f8f33309>] nfsd_vfs_read+0x38d/0x3b1 [nfsd]
Feb 26 01:52:42 file1 kernel: [<f8f313b8>] nfsd_acceptable+0x0/0xd1 [nfsd]
Feb 26 01:52:42 file1 kernel: [<c016014f>] dentry_open+0x34/0x64
Feb 26 01:52:43 file1 kernel: [<f8f3373c>] nfsd_read+0xee/0xfb [nfsd]
Feb 26 01:52:43 file1 kernel: [<f8f39b8b>] nfsd3_proc_read+0xfe/0x186 [nfsd]
Feb 26 01:52:43 file1 kernel: [<f8f3b4cb>] nfs3svc_decode_readargs+0x0/0xeb [nfsd]
Feb 26 01:52:43 file1 kernel: [<f8f2f855>] nfsd_dispatch+0xc5/0x1ac [nfsd]
Feb 26 01:52:43 file1 kernel: [<c0445ab2>] svcauth_unix_set_client+0x116/0x165
Feb 26 01:52:43 file1 kernel: [<c0441ad1>] svc_process+0x4e9/0x6b4
Feb 26 01:52:43 file1 kernel: [<c01168e2>] default_wake_function+0x0/0x8
Feb 26 01:52:43 file1 kernel: [<f8f2f63d>] nfsd+0x16a/0x290 [nfsd]
Feb 26 01:52:43 file1 kernel: [<f8f2f4d3>] nfsd+0x0/0x290 [nfsd]
Feb 26 01:52:43 file1 kernel: [<c0103463>] kernel_thread_helper+0x7/0x10
Feb 26 01:52:43 file1 kernel: =======================

  parent reply	other threads:[~2008-02-27  7:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-26 15:48 Kernel (2.6.24) crash on nfsd (BUG: soft lockup) Gertjan Oude Lohuis
     [not found] ` <47C434D2.80601-DW70C6hi67U@public.gmane.org>
2008-02-27  6:46   ` Gertjan Oude Lohuis
     [not found]     ` <47C50754.5030107-DW70C6hi67U@public.gmane.org>
2008-02-27  7:01       ` Gertjan Oude Lohuis [this message]
     [not found]         ` <47C50ABB.8050700-DW70C6hi67U@public.gmane.org>
2008-02-28 10:56           ` Kernel 2.6.23.17 crash (Was: Kernel (2.6.24) crash on nfsd (BUG: soft lockup)) Allard Hoeve
     [not found]             ` <Pine.LNX.4.62.0802281153040.31013-FHjt3+7qfYHBZBx2VKNGNcSTQT6m/s+e@public.gmane.org>
2008-03-01 16:39               ` J. Bruce Fields
2008-03-01 17:03                 ` Jens Axboe
2008-03-05 10:25                   ` Gertjan Oude Lohuis
2008-02-28 11:08   ` Kernel (2.6.24) crash on nfsd (BUG: soft lockup) Gertjan Oude Lohuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47C50ABB.8050700@byte.nl \
    --to=gertjan-dw70c6hi67u@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox