public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* file_splice_read problem in 2.6.24.2?
@ 2008-06-04 14:48 Tristan Linnenbank
  2008-06-04 16:36 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Tristan Linnenbank @ 2008-06-04 14:48 UTC (permalink / raw)
  To: linux-kernel

Dear lkml,

this afternoon I had a kernel crash on one of my webboxes. 
Halting/rebooting the machine after the crash was not possible. I had to 
power cycle it.

Pid: 22361, comm: apache2 Not tainted (2.6.24.2-fwsh-byte #2)
EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
EIP is at find_get_pages_contig+0x67/0x73
EAX: 00000000 EBX: 00000010 ECX: c1c75e20 EDX: c1c75e20
ESI: 00000010 EDI: de5cb920 EBP: 00000010 ESP: d43b7cd8
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: b77f8e04 CR3: 0c78a000 CR4: 000006f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
  [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
  [<c0132efc>] clocksource_get_next+0x3a/0x40
  [<c0113b11>] sched_slice+0x15/0x6f
  [<c0110cb8>] read_hpet+0xa/0xd
  [<c0131291>] getnstimeofday+0x31/0x105
  [<f905ee38>] kcs_event+0xb0/0x690 [ipmi_si]
  [<c0134301>] clockevents_program_event+0xbf/0x134
  [<f905c07d>] start_next_msg+0x14/0xa1 [ipmi_si]
  [<c0122ed9>] lock_timer_base+0x27/0x51
  [<c0122f83>] __mod_timer+0x80/0x8e
  [<f905c9ba>] smi_timeout+0x0/0xfe [ipmi_si]
  [<c0123289>] run_timer_softirq+0xcf/0x184
  [<c012a893>] __rcu_process_callbacks+0x76/0xbb
  [<c011f979>] tasklet_action+0x53/0x93
  [<c011f754>] __do_softirq+0xba/0xcf
  [<c017c88d>] generic_file_splice_read+0x75/0xc9
  [<c01eda5c>] nfs_file_splice_read+0x67/0x9d
  [<c017d083>] do_splice_to+0x6e/0x90
  [<c017d144>] splice_direct_to_actor+0x9f/0x166
  [<c017d20b>] direct_splice_actor+0x0/0x31
  [<c017d2a4>] do_splice_direct+0x68/0x8b
  [<c016141a>] do_readv_writev+0x130/0x193
  [<c01617ff>] do_sendfile+0x1f5/0x256
  [<c01618b8>] sys_sendfile+0x58/0xa5
  [<c0102836>] sysenter_past_esp+0x5f/0x85
  =======================

pid 22361 was an apache2 process.
the "-fwsh-byte" suffix to the kernel string indicates a forwarded-share 
patch to the kernel.

We (=the company I work for) had similar kernel crashes before (
see http://article.gmane.org/gmane.linux.nfs/19130, and 
http://article.gmane.org/gmane.linux.nfs/19107). Those crashes were on 
nfs servers, but the webbox is an nfs client.

We switched the webbox to kernel 2.5.25.4 to test if that will fix the 
problem.

Are there any more people that have experienced this issue before?

What information can I provide to ease debugging?

As I am not a member of LKML, could you please CC me in the replies to 
the list?

Thanks in advance.

Kind regards,
Tristan Linnenbank



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: file_splice_read problem in 2.6.24.2?
  2008-06-04 14:48 file_splice_read problem in 2.6.24.2? Tristan Linnenbank
@ 2008-06-04 16:36 ` Jens Axboe
  2008-06-05  6:59   ` Tristan Linnenbank
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2008-06-04 16:36 UTC (permalink / raw)
  To: Tristan Linnenbank; +Cc: linux-kernel

On Wed, Jun 04 2008, Tristan Linnenbank wrote:
> Dear lkml,
> 
> this afternoon I had a kernel crash on one of my webboxes. 
> Halting/rebooting the machine after the crash was not possible. I 
> had to power cycle it.
> 
> Pid: 22361, comm: apache2 Not tainted (2.6.24.2-fwsh-byte #2)
> EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
> EIP is at find_get_pages_contig+0x67/0x73
> EAX: 00000000 EBX: 00000010 ECX: c1c75e20 EDX: c1c75e20
> ESI: 00000010 EDI: de5cb920 EBP: 00000010 ESP: d43b7cd8
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> CR0: 8005003b CR2: b77f8e04 CR3: 0c78a000 CR4: 000006f0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
>  [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
>  [<c0132efc>] clocksource_get_next+0x3a/0x40
>  [<c0113b11>] sched_slice+0x15/0x6f
>  [<c0110cb8>] read_hpet+0xa/0xd
>  [<c0131291>] getnstimeofday+0x31/0x105
>  [<f905ee38>] kcs_event+0xb0/0x690 [ipmi_si]
>  [<c0134301>] clockevents_program_event+0xbf/0x134
>  [<f905c07d>] start_next_msg+0x14/0xa1 [ipmi_si]
>  [<c0122ed9>] lock_timer_base+0x27/0x51
>  [<c0122f83>] __mod_timer+0x80/0x8e
>  [<f905c9ba>] smi_timeout+0x0/0xfe [ipmi_si]
>  [<c0123289>] run_timer_softirq+0xcf/0x184
>  [<c012a893>] __rcu_process_callbacks+0x76/0xbb
>  [<c011f979>] tasklet_action+0x53/0x93
>  [<c011f754>] __do_softirq+0xba/0xcf
>  [<c017c88d>] generic_file_splice_read+0x75/0xc9
>  [<c01eda5c>] nfs_file_splice_read+0x67/0x9d
>  [<c017d083>] do_splice_to+0x6e/0x90
>  [<c017d144>] splice_direct_to_actor+0x9f/0x166
>  [<c017d20b>] direct_splice_actor+0x0/0x31
>  [<c017d2a4>] do_splice_direct+0x68/0x8b
>  [<c016141a>] do_readv_writev+0x130/0x193
>  [<c01617ff>] do_sendfile+0x1f5/0x256
>  [<c01618b8>] sys_sendfile+0x58/0xa5
>  [<c0102836>] sysenter_past_esp+0x5f/0x85
>  =======================
> 
> pid 22361 was an apache2 process.
> the "-fwsh-byte" suffix to the kernel string indicates a 
> forwarded-share patch to the kernel.
> 
> We (=the company I work for) had similar kernel crashes before (
> see http://article.gmane.org/gmane.linux.nfs/19130, and 
> http://article.gmane.org/gmane.linux.nfs/19107). Those crashes were 
> on nfs servers, but the webbox is an nfs client.
> 
> We switched the webbox to kernel 2.5.25.4 to test if that will fix 
> the problem.
> 
> Are there any more people that have experienced this issue before?
> 
> What information can I provide to ease debugging?
> 
> As I am not a member of LKML, could you please CC me in the replies 
> to the list?

So either this is fixed by this:

http://git.kernel.dk/?p=linux-2.6.git;a=commit;h=8191ecd1d14c6914c660dfa007154860a7908857

or it's a different bug. You should post the full oops (including any
message that came before the oops, like the 'locked up for foo seconds'
in the urls you reference above) with the Code line at the bottom as
well so we can see what the registers are used for.

If it's the bug fixed with the above commit, then 2.6.25.x should
work. Unfortunately I'm unsure of the -stable status of the above
patch.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: file_splice_read problem in 2.6.24.2?
  2008-06-04 16:36 ` Jens Axboe
@ 2008-06-05  6:59   ` Tristan Linnenbank
  2008-06-05  7:03     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Tristan Linnenbank @ 2008-06-05  6:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

Jens Axboe wrote:
> So either this is fixed by this:
> 
> http://git.kernel.dk/?p=linux-2.6.git;a=commit;h=8191ecd1d14c6914c660dfa007154860a7908857
> 
> or it's a different bug. You should post the full oops (including any
> message that came before the oops, like the 'locked up for foo seconds'
> in the urls you reference above) with the Code line at the bottom as
> well so we can see what the registers are used for.
> 
> If it's the bug fixed with the above commit, then 2.6.25.x should
> work. Unfortunately I'm unsure of the -stable status of the above
> patch.
> 
thanks for your reply.

I appended five of the bunch of errors to this mail. They all lock the 
CPU for 11 seconds (just like the nfsd errors we had in February/April), 
so that could be a sign of them being the same bug.

It seems to be the same problem. We've only seen this behaviour once on 
the one machine though. I'll keep a couple of webservers on 2.6.24.2 and 
some on 2.6.25.4, just to see what happens.

Thanks!

Kind regards,

Tristan

Jun  4 15:08:38 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck 
for 11s! [apache2:22361]
Jun  4 15:08:38 web10.c1.internal kernel: Jun  4 15:08:38 
web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted 
(2.6.24.2-fwsh-byte #2)
Jun  4 15:08:38 web10.c1.internal kernel: EIP: 0060:[<c0140967>] EFLAGS: 
00000286 CPU: 0
Jun  4 15:08:38 web10.c1.internal kernel: EIP is at 
find_get_pages_contig+0x67/0x73
Jun  4 15:08:38 web10.c1.internal kernel: EAX: 00000000 EBX: 00000010 
ECX: c1c75e20 EDX: c1c75e20
Jun  4 15:08:38 web10.c1.internal kernel: ESI: 00000010 EDI: de5cb920 
EBP: 00000010 ESP: d43b7cd8
Jun  4 15:08:38 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 
0033 SS: 0068
Jun  4 15:08:38 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 
CR3: 0c78a000 CR4: 000006f0
Jun  4 15:08:38 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 
DR2: 00000000 DR3: 00000000
Jun  4 15:08:38 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun  4 15:08:38 web10.c1.internal kernel: [<c017c49c>] 
__generic_file_splice_read+0xa2/0x41e
Jun  4 15:08:38 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun  4 15:08:38 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun  4 15:08:38 web10.c1.internal kernel: [<c0131291>] 
getnstimeofday+0x31/0x105
Jun  4 15:08:38 web10.c1.internal kernel: [<c0122ed9>] 
lock_timer_base+0x27/0x51
Jun  4 15:08:38 web10.c1.internal kernel: [<c0122f83>] __mod_timer+0x80/0x8e
Jun  4 15:08:38 web10.c1.internal kernel: [<c040dde0>] 
tcp_keepalive_timer+0x0/0x1c4
Jun  4 15:08:38 web10.c1.internal kernel: [<c03d03ce>] 
sk_reset_timer+0xc/0x16
Jun  4 15:08:38 web10.c1.internal kernel: [<c040dd9b>] 
tcp_synack_timer+0x19/0x1d
Jun  4 15:08:38 web10.c1.internal kernel: [<c040df9c>] 
tcp_keepalive_timer+0x1bc/0x1c4
Jun  4 15:08:38 web10.c1.internal kernel: [<c0123289>] 
run_timer_softirq+0xcf/0x184
Jun  4 15:08:38 web10.c1.internal kernel: [<c012a893>] 
__rcu_process_callbacks+0x76/0xbb
Jun  4 15:08:38 web10.c1.internal kernel: [<c011f979>] 
tasklet_action+0x53/0x93
Jun  4 15:08:38 web10.c1.internal kernel: [<c011f754>] 
__do_softirq+0xba/0xcf
Jun  4 15:08:38 web10.c1.internal kernel: [<c017c88d>] 
generic_file_splice_read+0x75/0xc9
Jun  4 15:08:38 web10.c1.internal kernel: [<c01eda5c>] 
nfs_file_splice_read+0x67/0x9d
Jun  4 15:08:38 web10.c1.internal kernel: [<c017d083>] 
do_splice_to+0x6e/0x90
Jun  4 15:08:38 web10.c1.internal kernel: [<c017d144>] 
splice_direct_to_actor+0x9f/0x166
Jun  4 15:08:38 web10.c1.internal kernel: [<c017d20b>] 
direct_splice_actor+0x0/0x31
Jun  4 15:08:38 web10.c1.internal kernel: [<c017d2a4>] 
do_splice_direct+0x68/0x8b
Jun  4 15:08:38 web10.c1.internal kernel: [<c016141a>] 
do_readv_writev+0x130/0x193
Jun  4 15:08:38 web10.c1.internal kernel: [<c01617ff>] 
do_sendfile+0x1f5/0x256
Jun  4 15:08:38 web10.c1.internal kernel: [<c01618b8>] 
sys_sendfile+0x58/0xa5
Jun  4 15:08:38 web10.c1.internal kernel: [<c0102836>] 
sysenter_past_esp+0x5f/0x85
Jun  4 15:08:38 web10.c1.internal kernel: =======================
Jun  4 15:08:50 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck 
for 11s! [apache2:22361]
Jun  4 15:08:50 web10.c1.internal kernel: Jun  4 15:08:50 
web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted 
(2.6.24.2-fwsh-byte #2)
Jun  4 15:08:50 web10.c1.internal kernel: EIP: 0060:[<c0140967>] EFLAGS: 
00000286 CPU: 0
Jun  4 15:08:50 web10.c1.internal kernel: EIP is at 
find_get_pages_contig+0x67/0x73
Jun  4 15:08:50 web10.c1.internal kernel: EAX: 00000000 EBX: 00000010 
ECX: c1c75e20 EDX: c1c75e20
Jun  4 15:08:50 web10.c1.internal kernel: ESI: 00000010 EDI: de5cb920 
EBP: 00000010 ESP: d43b7cd8
Jun  4 15:08:50 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 
0033 SS: 0068
Jun  4 15:08:50 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 
CR3: 0c78a000 CR4: 000006f0
Jun  4 15:08:50 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 
DR2: 00000000 DR3: 00000000
Jun  4 15:08:50 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun  4 15:08:50 web10.c1.internal kernel: [<c017c49c>] 
__generic_file_splice_read+0xa2/0x41e
Jun  4 15:08:50 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun  4 15:08:50 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun  4 15:08:50 web10.c1.internal kernel: [<c0131291>] 
getnstimeofday+0x31/0x105
Jun  4 15:08:50 web10.c1.internal kernel: [<c0134301>] 
clockevents_program_event+0xbf/0x134
Jun  4 15:08:50 web10.c1.internal kernel: [<c012ef49>] 
ktime_get_ts+0x15/0x47
Jun  4 15:08:50 web10.c1.internal kernel: [<c01231ea>] 
run_timer_softirq+0x30/0x184
Jun  4 15:08:50 web10.c1.internal kernel: [<c012a893>] 
__rcu_process_callbacks+0x76/0xbb
Jun  4 15:08:50 web10.c1.internal kernel: [<c011f979>] 
tasklet_action+0x53/0x93
Jun  4 15:08:50 web10.c1.internal kernel: [<c011f754>] 
__do_softirq+0xba/0xcf
Jun  4 15:08:50 web10.c1.internal kernel: [<c017c88d>] 
generic_file_splice_read+0x75/0xc9
Jun  4 15:08:50 web10.c1.internal kernel: [<c01eda5c>] 
nfs_file_splice_read+0x67/0x9d
Jun  4 15:08:50 web10.c1.internal kernel: [<c017d083>] 
do_splice_to+0x6e/0x90
Jun  4 15:08:50 web10.c1.internal kernel: [<c017d144>] 
splice_direct_to_actor+0x9f/0x166
Jun  4 15:08:50 web10.c1.internal kernel: [<c017d20b>] 
direct_splice_actor+0x0/0x31
Jun  4 15:08:50 web10.c1.internal kernel: [<c017d2a4>] 
do_splice_direct+0x68/0x8b
Jun  4 15:08:50 web10.c1.internal kernel: [<c016141a>] 
do_readv_writev+0x130/0x193
Jun  4 15:08:50 web10.c1.internal kernel: [<c01617ff>] 
do_sendfile+0x1f5/0x256
Jun  4 15:08:50 web10.c1.internal kernel: [<c01618b8>] 
sys_sendfile+0x58/0xa5
Jun  4 15:08:50 web10.c1.internal kernel: [<c0102836>] 
sysenter_past_esp+0x5f/0x85
Jun  4 15:08:51 web10.c1.internal kernel: =======================
Jun  4 15:09:02 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck 
for 11s! [apache2:22361]
Jun  4 15:09:02 web10.c1.internal kernel: Jun  4 15:09:02 
web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted 
(2.6.24.2-fwsh-byte #2)
Jun  4 15:09:02 web10.c1.internal kernel: EIP: 0060:[<c0147bde>] EFLAGS: 
00000246 CPU: 0
Jun  4 15:09:02 web10.c1.internal kernel: EIP is at put_page+0x7/0x20
Jun  4 15:09:02 web10.c1.internal kernel: EAX: 80000028 EBX: 00000010 
ECX: 00000010 EDX: c2180ea0
Jun  4 15:09:02 web10.c1.internal kernel: ESI: 00000000 EDI: de5cb870 
EBP: 00000000 ESP: d43b7ce8
Jun  4 15:09:02 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 
0033 SS: 0068
Jun  4 15:09:02 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 
CR3: 0c78a000 CR4: 000006f0
Jun  4 15:09:02 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 
DR2: 00000000 DR3: 00000000
Jun  4 15:09:02 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun  4 15:09:02 web10.c1.internal kernel: [<c017c6bc>] 
__generic_file_splice_read+0x2c2/0x41e
Jun  4 15:09:02 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun  4 15:09:02 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun  4 15:09:02 web10.c1.internal kernel: [<c0131291>] 
getnstimeofday+0x31/0x105
Jun  4 15:09:02 web10.c1.internal kernel: [<f905ee38>] 
kcs_event+0xb0/0x690 [ipmi_si]
Jun  4 15:09:02 web10.c1.internal kernel: [<c0134301>] 
clockevents_program_event+0xbf/0x134
Jun  4 15:09:02 web10.c1.internal kernel: [<f905c07d>] 
start_next_msg+0x14/0xa1 [ipmi_si]
Jun  4 15:09:02 web10.c1.internal kernel: [<c0122ed9>] 
lock_timer_base+0x27/0x51
Jun  4 15:09:02 web10.c1.internal kernel: [<c0122f83>] __mod_timer+0x80/0x8e
Jun  4 15:09:02 web10.c1.internal kernel: [<f905c9ba>] 
smi_timeout+0x0/0xfe [ipmi_si]
Jun  4 15:09:02 web10.c1.internal kernel: [<c0123289>] 
run_timer_softirq+0xcf/0x184
Jun  4 15:09:02 web10.c1.internal kernel: [<c012a893>] 
__rcu_process_callbacks+0x76/0xbb
Jun  4 15:09:02 web10.c1.internal kernel: [<c011f979>] 
tasklet_action+0x53/0x93
Jun  4 15:09:02 web10.c1.internal kernel: [<c011f754>] 
__do_softirq+0xba/0xcf
Jun  4 15:09:02 web10.c1.internal kernel: [<c017c88d>] 
generic_file_splice_read+0x75/0xc9
Jun  4 15:09:02 web10.c1.internal kernel: [<c01eda5c>] 
nfs_file_splice_read+0x67/0x9d
Jun  4 15:09:02 web10.c1.internal kernel: [<c017d083>] 
do_splice_to+0x6e/0x90
Jun  4 15:09:02 web10.c1.internal kernel: [<c017d144>] 
splice_direct_to_actor+0x9f/0x166
Jun  4 15:09:02 web10.c1.internal kernel: [<c017d20b>] 
direct_splice_actor+0x0/0x31
Jun  4 15:09:02 web10.c1.internal kernel: [<c017d2a4>] 
do_splice_direct+0x68/0x8b
Jun  4 15:09:02 web10.c1.internal kernel: [<c016141a>] 
do_readv_writev+0x130/0x193
Jun  4 15:09:02 web10.c1.internal kernel: [<c01617ff>] 
do_sendfile+0x1f5/0x256
Jun  4 15:09:02 web10.c1.internal kernel: [<c01618b8>] 
sys_sendfile+0x58/0xa5
Jun  4 15:09:02 web10.c1.internal kernel: [<c0102836>] 
sysenter_past_esp+0x5f/0x85
Jun  4 15:09:03 web10.c1.internal kernel: =======================
Jun  4 15:09:14 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck 
for 11s! [apache2:22361]
Jun  4 15:09:14 web10.c1.internal kernel: Jun  4 15:09:14 
web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted 
(2.6.24.2-fwsh-byte #2)
Jun  4 15:09:14 web10.c1.internal kernel: EIP: 0060:[<c0140967>] EFLAGS: 
00000286 CPU: 0
Jun  4 15:09:14 web10.c1.internal kernel: EIP is at 
find_get_pages_contig+0x67/0x73
Jun  4 15:09:14 web10.c1.internal kernel: EAX: 00000000 EBX: 00000010 
ECX: c1c75e20 EDX: c1c75e20
Jun  4 15:09:14 web10.c1.internal kernel: ESI: 00000010 EDI: de5cb920 
EBP: 00000010 ESP: d43b7cd8
Jun  4 15:09:14 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 
0033 SS: 0068
Jun  4 15:09:14 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 
CR3: 0c78a000 CR4: 000006f0
Jun  4 15:09:14 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 
DR2: 00000000 DR3: 00000000
Jun  4 15:09:14 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun  4 15:09:14 web10.c1.internal kernel: [<c017c49c>] 
__generic_file_splice_read+0xa2/0x41e
Jun  4 15:09:14 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun  4 15:09:14 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun  4 15:09:14 web10.c1.internal kernel: [<c0131291>] 
getnstimeofday+0x31/0x105
Jun  4 15:09:14 web10.c1.internal kernel: [<f905ee38>] 
kcs_event+0xb0/0x690 [ipmi_si]
Jun  4 15:09:14 web10.c1.internal kernel: [<c0134301>] 
clockevents_program_event+0xbf/0x134
Jun  4 15:09:14 web10.c1.internal kernel: [<f905c07d>] 
start_next_msg+0x14/0xa1 [ipmi_si]
Jun  4 15:09:14 web10.c1.internal kernel: [<c0122ed9>] 
lock_timer_base+0x27/0x51
Jun  4 15:09:14 web10.c1.internal kernel: [<c0122f83>] __mod_timer+0x80/0x8e
Jun  4 15:09:14 web10.c1.internal kernel: [<f905c9ba>] 
smi_timeout+0x0/0xfe [ipmi_si]
Jun  4 15:09:14 web10.c1.internal kernel: [<c0123289>] 
run_timer_softirq+0xcf/0x184
Jun  4 15:09:14 web10.c1.internal kernel: [<c012a893>] 
__rcu_process_callbacks+0x76/0xbb
Jun  4 15:09:14 web10.c1.internal kernel: [<c011f979>] 
tasklet_action+0x53/0x93
Jun  4 15:09:14 web10.c1.internal kernel: [<c011f754>] 
__do_softirq+0xba/0xcf
Jun  4 15:09:14 web10.c1.internal kernel: [<c017c88d>] 
generic_file_splice_read+0x75/0xc9
Jun  4 15:09:14 web10.c1.internal kernel: [<c01eda5c>] 
nfs_file_splice_read+0x67/0x9d
Jun  4 15:09:14 web10.c1.internal kernel: [<c017d083>] 
do_splice_to+0x6e/0x90
Jun  4 15:09:14 web10.c1.internal kernel: [<c017d144>] 
splice_direct_to_actor+0x9f/0x166
Jun  4 15:09:14 web10.c1.internal kernel: [<c017d20b>] 
direct_splice_actor+0x0/0x31
Jun  4 15:09:14 web10.c1.internal kernel: [<c017d2a4>] 
do_splice_direct+0x68/0x8b
Jun  4 15:09:14 web10.c1.internal kernel: [<c016141a>] 
do_readv_writev+0x130/0x193
Jun  4 15:09:14 web10.c1.internal kernel: [<c01617ff>] 
do_sendfile+0x1f5/0x256
Jun  4 15:09:14 web10.c1.internal kernel: [<c01618b8>] 
sys_sendfile+0x58/0xa5
Jun  4 15:09:14 web10.c1.internal kernel: [<c0102836>] 
sysenter_past_esp+0x5f/0x85
Jun  4 15:09:15 web10.c1.internal kernel: =======================
Jun  4 15:09:27 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck 
for 11s! [apache2:22361]
Jun  4 15:09:27 web10.c1.internal kernel: Jun  4 15:09:27 
web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted 
(2.6.24.2-fwsh-byte #2)
Jun  4 15:09:27 web10.c1.internal kernel: EIP: 0060:[<c0140967>] EFLAGS: 
00000286 CPU: 0
Jun  4 15:09:27 web10.c1.internal kernel: EIP is at 
find_get_pages_contig+0x67/0x73
Jun  4 15:09:27 web10.c1.internal kernel: EAX: 00000000 EBX: 00000010 
ECX: c1c75e20 EDX: c1c75e20
Jun  4 15:09:27 web10.c1.internal kernel: ESI: 00000010 EDI: de5cb920 
EBP: 00000010 ESP: d43b7cd8
Jun  4 15:09:27 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 
0033 SS: 0068
Jun  4 15:09:27 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 
CR3: 0c78a000 CR4: 000006f0
Jun  4 15:09:27 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 
DR2: 00000000 DR3: 00000000
Jun  4 15:09:27 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun  4 15:09:27 web10.c1.internal kernel: [<c017c49c>] 
__generic_file_splice_read+0xa2/0x41e
Jun  4 15:09:27 web10.c1.internal kernel: [<c0132efc>] 
clocksource_get_next+0x3a/0x40
Jun  4 15:09:27 web10.c1.internal kernel: [<c01315cb>] 
change_clocksource+0xc/0x205
Jun  4 15:09:27 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun  4 15:09:27 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun  4 15:09:27 web10.c1.internal kernel: [<c0131291>] 
getnstimeofday+0x31/0x105
Jun  4 15:09:27 web10.c1.internal kernel: [<f905ee38>] 
kcs_event+0xb0/0x690 [ipmi_si]
Jun  4 15:09:27 web10.c1.internal kernel: [<c0134301>] 
clockevents_program_event+0xbf/0x134
Jun  4 15:09:27 web10.c1.internal kernel: [<f905c07d>] 
start_next_msg+0x14/0xa1 [ipmi_si]
Jun  4 15:09:27 web10.c1.internal kernel: [<c0122ed9>] 
lock_timer_base+0x27/0x51
Jun  4 15:09:27 web10.c1.internal kernel: [<c0122f83>] __mod_timer+0x80/0x8e
Jun  4 15:09:27 web10.c1.internal kernel: [<f905c9ba>] 
smi_timeout+0x0/0xfe [ipmi_si]
Jun  4 15:09:27 web10.c1.internal kernel: [<c0123289>] 
run_timer_softirq+0xcf/0x184
Jun  4 15:09:27 web10.c1.internal kernel: [<c012a893>] 
__rcu_process_callbacks+0x76/0xbb
Jun  4 15:09:27 web10.c1.internal kernel: [<c011f979>] 
tasklet_action+0x53/0x93
Jun  4 15:09:27 web10.c1.internal kernel: [<c011f754>] 
__do_softirq+0xba/0xcf
Jun  4 15:09:27 web10.c1.internal kernel: [<c017c88d>] 
generic_file_splice_read+0x75/0xc9
Jun  4 15:09:27 web10.c1.internal kernel: [<c01eda5c>] 
nfs_file_splice_read+0x67/0x9d
Jun  4 15:09:27 web10.c1.internal kernel: [<c017d083>] 
do_splice_to+0x6e/0x90
Jun  4 15:09:27 web10.c1.internal kernel: [<c017d144>] 
splice_direct_to_actor+0x9f/0x166
Jun  4 15:09:27 web10.c1.internal kernel: [<c017d20b>] 
direct_splice_actor+0x0/0x31
Jun  4 15:09:27 web10.c1.internal kernel: [<c017d2a4>] 
do_splice_direct+0x68/0x8b
Jun  4 15:09:27 web10.c1.internal kernel: [<c016141a>] 
do_readv_writev+0x130/0x193
Jun  4 15:09:27 web10.c1.internal kernel: [<c01617ff>] 
do_sendfile+0x1f5/0x256
Jun  4 15:09:27 web10.c1.internal kernel: [<c01618b8>] 
sys_sendfile+0x58/0xa5
Jun  4 15:09:27 web10.c1.internal kernel: [<c0102836>] 
sysenter_past_esp+0x5f/0x85
Jun  4 15:09:27 web10.c1.internal kernel: =======================


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: file_splice_read problem in 2.6.24.2?
  2008-06-05  6:59   ` Tristan Linnenbank
@ 2008-06-05  7:03     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2008-06-05  7:03 UTC (permalink / raw)
  To: Tristan Linnenbank; +Cc: linux-kernel

On Thu, Jun 05 2008, Tristan Linnenbank wrote:
> Jens Axboe wrote:
> >So either this is fixed by this:
> >
> >http://git.kernel.dk/?p=linux-2.6.git;a=commit;h=8191ecd1d14c6914c660dfa007154860a7908857
> >
> >or it's a different bug. You should post the full oops (including any
> >message that came before the oops, like the 'locked up for foo seconds'
> >in the urls you reference above) with the Code line at the bottom as
> >well so we can see what the registers are used for.
> >
> >If it's the bug fixed with the above commit, then 2.6.25.x should
> >work. Unfortunately I'm unsure of the -stable status of the above
> >patch.
> >
> thanks for your reply.
> 
> I appended five of the bunch of errors to this mail. They all lock the 
> CPU for 11 seconds (just like the nfsd errors we had in February/April), 
> so that could be a sign of them being the same bug.
> 
> It seems to be the same problem. We've only seen this behaviour once on 
> the one machine though. I'll keep a couple of webservers on 2.6.24.2 and 
> some on 2.6.25.4, just to see what happens.
> 
> Thanks!
> 
> Kind regards,
> 
> Tristan
> 
> Jun  4 15:08:38 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck 
> for 11s! [apache2:22361]

Yep, that looks like the same 'spinning in splice read' problem, so
the 2.6.25 kernel should work fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-06-05  7:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-04 14:48 file_splice_read problem in 2.6.24.2? Tristan Linnenbank
2008-06-04 16:36 ` Jens Axboe
2008-06-05  6:59   ` Tristan Linnenbank
2008-06-05  7:03     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox