* High load average on disk I/O on 2.6.17-rc3
@ 2006-05-05 17:10 Jason Schoonover
2006-05-06 23:03 ` bert hubert
2006-05-07 16:50 ` Andrew Morton
0 siblings, 2 replies; 40+ messages in thread
From: Jason Schoonover @ 2006-05-05 17:10 UTC (permalink / raw)
To: linux-kernel
Hi all,
I'm not sure if this is the right list to post to, so please direct me to the
appropriate list if this is the wrong one.
I'm having some problems on the latest 2.6.17-rc3 kernel and SCSI disk I/O.
Whenever I copy any large file (over 500GB) the load average starts to slowly
rise and after about a minute it is up to 7.5 and keeps on rising (depending
on how long the file takes to copy). When I watch top, the processes at the
top of the list are cp, pdflush, kjournald and kswapd.
I just recently upgraded the box, it used to run Redhat 9 with kernel 2.4.20
just fine. This problem did not show up with 2.4.20. I just recently
installed Debian/unstable on it and that's when the problems started showing
up.
Initially the problem showed up on debian's 2.6.15-1-686-smp kernel pkg, so I
upgraded to 2.6.16-1-686; same problem, I then downloaded 2.6.16.12 from
kernel.org and finally ended up downloading and compiling 2.6.17-rc3 and same
problem occurs.
The hardware is a Dell PowerEdge 2650 Dual Xeon 2.4GHZ/2GB RAM, the hard drive
controller is (as reported by lspci):
0000:04:08.1 RAID bus controller: Dell PowerEdge Expandable RAID Controller
3/Di (rev 01)
0000:05:06.0 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
0000:05:06.1 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
The PERC RAID configuration is four 136GB SCSI drives RAID5'd together.
Can anybody help me out here? Maybe I'm doing something wrong with the
configuration. Any help/suggestions would be great.
Thanks,
Jason Schoonover
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
[not found] <69c8K-3Bu-57@gated-at.bofh.it>
@ 2006-05-05 23:12 ` Robert Hancock
2006-05-06 4:39 ` Jason Schoonover
0 siblings, 1 reply; 40+ messages in thread
From: Robert Hancock @ 2006-05-05 23:12 UTC (permalink / raw)
To: linux-kernel; +Cc: jasons
Jason Schoonover wrote:
> Hi all,
>
> I'm not sure if this is the right list to post to, so please direct me to the
> appropriate list if this is the wrong one.
>
> I'm having some problems on the latest 2.6.17-rc3 kernel and SCSI disk I/O.
> Whenever I copy any large file (over 500GB) the load average starts to slowly
> rise and after about a minute it is up to 7.5 and keeps on rising (depending
> on how long the file takes to copy). When I watch top, the processes at the
> top of the list are cp, pdflush, kjournald and kswapd.
>
Are there some processes stuck in D state? These will contribute to the
load average even if they are not using CPU.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-05 23:12 ` High load average on disk I/O on 2.6.17-rc3 Robert Hancock
@ 2006-05-06 4:39 ` Jason Schoonover
2006-05-06 17:20 ` Robert Hancock
0 siblings, 1 reply; 40+ messages in thread
From: Jason Schoonover @ 2006-05-06 4:39 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-kernel
Hi Robert,
There are, this is the relevant output of the process list:
...
4659 pts/6 Ss 0:00 -bash
4671 pts/5 R+ 0:12 cp -a test-dir/ new-test
4676 ? D 0:00 [pdflush]
4679 ? D 0:00 [pdflush]
4687 pts/4 D+ 0:01 hdparm -t /dev/sda
4688 ? D 0:00 [pdflush]
4690 ? D 0:00 [pdflush]
4692 ? D 0:00 [pdflush]
...
This was when I was copying a directory and then doing a performance test with
hdparm in a separate shell. The hdparm process was in [D+] state and
basically waited until the cp was finished. During the whole thing there
were up to 5 pdflush processes in [D] state.
The 5 minute load average hit 8.90 during this test.
Does that help?
Jason
-------Original Message-----
From: Robert Hancock
Sent: Friday 05 May 2006 16:12
To: linux-kernel
Subject: Re: High load average on disk I/O on 2.6.17-rc3
Jason Schoonover wrote:
> Hi all,
>
> I'm not sure if this is the right list to post to, so please direct me to
> the appropriate list if this is the wrong one.
>
> I'm having some problems on the latest 2.6.17-rc3 kernel and SCSI disk I/O.
> Whenever I copy any large file (over 500GB) the load average starts to
> slowly rise and after about a minute it is up to 7.5 and keeps on rising
> (depending on how long the file takes to copy). When I watch top, the
> processes at the top of the list are cp, pdflush, kjournald and kswapd.
Are there some processes stuck in D state? These will contribute to the
load average even if they are not using CPU.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-06 4:39 ` Jason Schoonover
@ 2006-05-06 17:20 ` Robert Hancock
2006-05-06 18:23 ` Jason Schoonover
0 siblings, 1 reply; 40+ messages in thread
From: Robert Hancock @ 2006-05-06 17:20 UTC (permalink / raw)
To: Jason Schoonover; +Cc: linux-kernel
Jason Schoonover wrote:
> Hi Robert,
>
> There are, this is the relevant output of the process list:
>
> ...
> 4659 pts/6 Ss 0:00 -bash
> 4671 pts/5 R+ 0:12 cp -a test-dir/ new-test
> 4676 ? D 0:00 [pdflush]
> 4679 ? D 0:00 [pdflush]
> 4687 pts/4 D+ 0:01 hdparm -t /dev/sda
> 4688 ? D 0:00 [pdflush]
> 4690 ? D 0:00 [pdflush]
> 4692 ? D 0:00 [pdflush]
> ...
>
> This was when I was copying a directory and then doing a performance test with
> hdparm in a separate shell. The hdparm process was in [D+] state and
> basically waited until the cp was finished. During the whole thing there
> were up to 5 pdflush processes in [D] state.
>
> The 5 minute load average hit 8.90 during this test.
>
> Does that help?
Well, it obviously explains why the load average is high, those D state
processes all count in the load average. It may be sort of a cosmetic
issue, since they're not actually using any CPU, but it's still a bit
unusual. For one thing, not sure why there are that many of them?
You could try enabling the SysRq triggers (if they're not already in
your kernel/distro) and doing Alt-SysRq-T which will dump the kernel
stack of all processes, that should show where exactly in the kernel
those pdflush processes are blocked..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-06 17:20 ` Robert Hancock
@ 2006-05-06 18:23 ` Jason Schoonover
2006-05-06 20:01 ` Robert Hancock
0 siblings, 1 reply; 40+ messages in thread
From: Jason Schoonover @ 2006-05-06 18:23 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]
Hi Robert,
I did started a ncftpget and managed to get 6 pdflush processes running in
state D, hopefully this will give us a chance to debug it.
I've attached the entire Alt+SysReq+T output here because I have no idea how
to read it.
Thanks,
Jason
-------Original Message-----
From: Robert Hancock
Sent: Saturday 06 May 2006 10:20
To: Jason Schoonover
Subject: Re: High load average on disk I/O on 2.6.17-rc3
Jason Schoonover wrote:
> Hi Robert,
>
> There are, this is the relevant output of the process list:
>
> ...
> 4659 pts/6 Ss 0:00 -bash
> 4671 pts/5 R+ 0:12 cp -a test-dir/ new-test
> 4676 ? D 0:00 [pdflush]
> 4679 ? D 0:00 [pdflush]
> 4687 pts/4 D+ 0:01 hdparm -t /dev/sda
> 4688 ? D 0:00 [pdflush]
> 4690 ? D 0:00 [pdflush]
> 4692 ? D 0:00 [pdflush]
> ...
>
> This was when I was copying a directory and then doing a performance test
> with hdparm in a separate shell. The hdparm process was in [D+] state and
> basically waited until the cp was finished. During the whole thing there
> were up to 5 pdflush processes in [D] state.
>
> The 5 minute load average hit 8.90 during this test.
>
> Does that help?
Well, it obviously explains why the load average is high, those D state
processes all count in the load average. It may be sort of a cosmetic
issue, since they're not actually using any CPU, but it's still a bit
unusual. For one thing, not sure why there are that many of them?
You could try enabling the SysRq triggers (if they're not already in
your kernel/distro) and doing Alt-SysRq-T which will dump the kernel
stack of all processes, that should show where exactly in the kernel
those pdflush processes are blocked..
[-- Attachment #2: sysreq.output.txt --]
[-- Type: text/plain, Size: 31887 bytes --]
015c0ca> sys_select+0x9a/0x166 <b0104a5b> do_IRQ+0x22/0x2b
<b010267b> syscall_call+0x7/0xb
snmpd S 3C14DC00 0 3771 1 3778 3764 (NOTLB)
f7975b58 f78d8300 0000000f 3c14dc00 00000000 000186a0 d5831281 000f68dc
dfbc4688 dfbc4580 b21a0540 b2014340 7b773b00 000f9663 00000001 b01273fc
b21f0500 f78aef00 00200286 b011e999 f7975b68 f7975b68 00200286 00000000
Call Trace:
<b01273fc> add_wait_queue+0x12/0x30 <b011e999> lock_timer_base+0x15/0x2f
<b026fd89> schedule_timeout+0x6c/0x8b <b011ecff> process_timeout+0x0/0x9
<b015baca> do_select+0x3b6/0x41e <f88df929> ext3_mark_iloc_dirty+0x2a7/0x345 [ext3]
<b015b65c> __pollwait+0x0/0xb8 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b0113571> try_to_wake_up+0x327/0x331
<b01609e5> inode_init_once+0x4c/0x14c <b0149394> cache_alloc_refill+0x39e/0x45d
<b0149062> cache_alloc_refill+0x6c/0x45d <b01117b5> __wake_up_common+0x2a/0x4f
<b017605f> proc_alloc_inode+0x41/0x66 <b0160911> alloc_inode+0xf5/0x17d
<b016e6c0> inotify_d_instantiate+0x3b/0x5e <b015fd63> d_instantiate+0x45/0x78
<b0160557> d_rehash+0x4f/0x5f <b0178a39> proc_lookup+0x77/0xaf
<b0137669> get_page_from_freelist+0x7e/0x31a <b0137895> get_page_from_freelist+0x2aa/0x31a
<b015bd1a> core_sys_select+0x1e8/0x2a2 <b01ad8db> vsnprintf+0x423/0x461
<b0137c3a> free_hot_cold_page+0x91/0xcd <b0137c90> __pagevec_free+0x1a/0x24
<b01393de> release_pages+0x147/0x150 <b0142040> page_remove_rmap+0x27/0x2a
<b021303e> lock_sock+0x85/0x8d <b0213d18> sk_free+0xa8/0xe2
<b0210e8e> sock_destroy_inode+0x13/0x16 <b015f456> dput+0x1b/0x11e
<b015c0ca> sys_select+0x9a/0x166 <b011b272> sys_gettimeofday+0x22/0x51
<b010267b> syscall_call+0x7/0xb
smbd S 000F4240 0 3776 3764 (NOTLB)
df977fb0 00000000 00000000 000f4240 00000000 00028b0a 00000012 00000002
dff18178 dff18070 dfb78580 b200c340 1685e740 000f41ff 00000000 00000000
00000001 00000007 df977fbc dff18070 00000001 df977fbc b0282b65 00000007
Call Trace:
<b011fa32> sys_pause+0x14/0x1a <b010267b> syscall_call+0x7/0xb
sshd S 004C4B40 0 3778 1 3875 3820 3771 (NOTLB)
f7f77b60 b23c4374 00000000 004c4b40 00000000 00000000 f8545f80 000f68dd
dfb81138 dfb81030 b02b0480 b200c340 ca71c780 000f95ea 00000000 00000000
dfa76ac8 00200246 b01273fc fffffffc df989a80 00200246 b023aee4 00000000
Call Trace:
<b01273fc> add_wait_queue+0x12/0x30 <b023aee4> tcp_poll+0x24/0x134
<b026fd30> schedule_timeout+0x13/0x8b <b02112c0> sock_poll+0x13/0x17
<b015baca> do_select+0x3b6/0x41e <f88df929> ext3_mark_iloc_dirty+0x2a7/0x345 [ext3]
<b015b65c> __pollwait+0x0/0xb8 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b014f430> __getblk+0x1d/0x211
<f8874995> do_get_write_access+0x44a/0x45c [jbd] <b01114b5> __activate_task+0x47/0x56
<b0113571> try_to_wake_up+0x327/0x331 <f88e698a> __ext3_journal_stop+0x19/0x37 [ext3]
<b01271f4> autoremove_wake_function+0x18/0x3a <b01117b5> __wake_up_common+0x2a/0x4f
<b0149062> cache_alloc_refill+0x6c/0x45d <b01ab175> kobject_put+0x16/0x19
<b01ab69b> kobject_release+0x0/0xa <b0215457> __alloc_skb+0x4a/0xee
<b0213988> sock_alloc_send_skb+0x5a/0x188 <b0217138> memcpy_fromiovec+0x29/0x4e
<b0214593> sock_def_readable+0x11/0x5e <b02659e1> unix_stream_sendmsg+0x21b/0x2e6
<b015bd1a> core_sys_select+0x1e8/0x2a2 <b0211077> do_sock_write+0xa4/0xad
<b0137669> get_page_from_freelist+0x7e/0x31a <b0137953> __alloc_pages+0x4e/0x261
<b0142611> __page_set_anon_rmap+0x2b/0x2f <b013d845> do_wp_page+0x244/0x286
<b013d869> do_wp_page+0x268/0x286 <b0215117> skb_dequeue+0x3b/0x41
<b0160690> destroy_inode+0x48/0x4c <b015f456> dput+0x1b/0x11e
<b015c0ca> sys_select+0x9a/0x166 <b014a58a> filp_close+0x4e/0x57
<b010267b> syscall_call+0x7/0xb
rpc.statd S 006ACFC0 0 3820 1 3828 3778 (NOTLB)
df9a9b60 b01114b5 b0328460 006acfc0 00000000 00000000 00000100 00000000
dfb7a138 dfb7a030 b21a0540 b2014340 3b0f8a80 000f41ff 00000001 f78a4c80
00000246 00000246 b01273fc 00000304 df917b80 00000246 b023aee4 00000000
Call Trace:
<b01114b5> __activate_task+0x47/0x56 <b01273fc> add_wait_queue+0x12/0x30
<b023aee4> tcp_poll+0x24/0x134 <b026fd30> schedule_timeout+0x13/0x8b
<b02112c0> sock_poll+0x13/0x17 <b015baca> do_select+0x3b6/0x41e
<b015b65c> __pollwait+0x0/0xb8 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b015b65c> __pollwait+0x0/0xb8 <b011357b> default_wake_function+0x0/0x15
<b021a02c> net_rx_action+0x98/0x146 <b011bdf9> __do_softirq+0x57/0xc0
<b011bfc0> local_bh_enable+0x5b/0x66 <b021aff9> dev_queue_xmit+0x1e5/0x1eb
<b02377ea> ip_output+0x1c0/0x1f6 <b02369b7> ip_push_pending_frames+0x351/0x3dd
<b02170ea> memcpy_toiovec+0x29/0x4e <b0217824> skb_copy_datagram_iovec+0x49/0x1a6
<b0215932> kfree_skbmem+0x65/0x69 <b024e64d> udp_recvmsg+0x18a/0x1d3
<b021318b> sock_common_recvmsg+0x36/0x4b <b0212a1e> sock_recvmsg+0xf1/0x10d
<b01271dc> autoremove_wake_function+0x0/0x3a <b015bd1a> core_sys_select+0x1e8/0x2a2
<b0210e04> move_addr_to_user+0x3a/0x52 <b0212d12> sys_recvfrom+0xfc/0x134
<b021303e> lock_sock+0x85/0x8d <b0211aa5> sys_bind+0x73/0xa0
<b0213d18> sk_free+0xa8/0xe2 <b0210e8e> sock_destroy_inode+0x13/0x16
<b015f456> dput+0x1b/0x11e <b015c0ca> sys_select+0x9a/0x166
<b014a58a> filp_close+0x4e/0x57 <b010267b> syscall_call+0x7/0xb
ntpd S 39E048C0 0 3828 1 3836 3820 (NOTLB)
df8d7b60 0583c2ce 00000127 39e048c0 00000000 00000000 00000000 b011bdf9
b21ddb98 b21dda90 b02b0480 b200c340 48b33bc0 000f966c 00000000 dff27540
00000246 b021773c dff27540 df98a118 df8d7bd4 df98a100 dff27540 00000000
Call Trace:
<b011bdf9> __do_softirq+0x57/0xc0 <b021773c> datagram_poll+0x21/0xc0
<b026fd30> schedule_timeout+0x13/0x8b <b02112c0> sock_poll+0x13/0x17
<b015baca> do_select+0x3b6/0x41e <f88df929> ext3_mark_iloc_dirty+0x2a7/0x345 [ext3]
<b015b65c> __pollwait+0x0/0xb8 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b021aff9> dev_queue_xmit+0x1e5/0x1eb
<b02377ea> ip_output+0x1c0/0x1f6 <b0113571> try_to_wake_up+0x327/0x331
<b0214593> sock_def_readable+0x11/0x5e <b01117b5> __wake_up_common+0x2a/0x4f
<b0111803> __wake_up+0x29/0x3c <b02145b5> sock_def_readable+0x33/0x5e
<b0243570> tcp_rcv_established+0x40e/0x69a <f886501c> ipt_hook+0x1c/0x20 [iptable_filter]
<b024963e> tcp_v4_do_rcv+0x23/0x2ba <b0232a0a> ip_local_deliver_finish+0x0/0x193
<b0232a0a> ip_local_deliver_finish+0x0/0x193 <b024a72a> tcp_v4_rcv+0x889/0x8dd
<b0232bf8> ip_local_deliver+0x5b/0x1fd <b0232cf6> ip_local_deliver+0x159/0x1fd
<b0213117> sk_reset_timer+0x12/0x1e <b02447b1> tcp_send_delayed_ack+0xb7/0xbd
<b015bd1a> core_sys_select+0x1e8/0x2a2 <f886501c> ipt_hook+0x1c/0x20 [iptable_filter]
<b011f676> __sigqueue_free+0x2a/0x2f <b010710a> convert_fxsr_to_user+0xdf/0x12e
<b0107274> save_i387+0xf0/0x105 <b0101e68> setup_sigcontext+0xde/0x11c
<b0102490> do_notify_resume+0x4ab/0x58b <b010729f> convert_fxsr_from_user+0x16/0xd6
<b015c0ca> sys_select+0x9a/0x166 <b0101c66> restore_sigcontext+0x102/0x14c
<b0101fc0> sys_sigreturn+0x98/0xbd <b010267b> syscall_call+0x7/0xb
atd S 000F4240 0 3836 1 3843 3828 (NOTLB)
df8b5f1c df8b5ee8 30b8a000 000f4240 00000000 00000000 445cddcf f7e23f5c
b21c8178 b21c8070 b21a0540 b2014340 a2ec1c80 000f93e0 00000001 b2014d94
b2014d8c b2014d8c 1268d3c8 00000282 b012996a 00000000 e1b033c8 00000000
Call Trace:
<b012996a> hrtimer_start+0xc5/0xd0 <b0270109> do_nanosleep+0x3b/0x63
<b01299b8> hrtimer_nanosleep+0x43/0xfe <b0120244> do_sigaction+0x92/0x149
<b0129596> hrtimer_wakeup+0x0/0x1c <b0129ab0> sys_nanosleep+0x3d/0x51
<b010267b> syscall_call+0x7/0xb
cron S 03A2C940 0 3843 1 3859 3836 (NOTLB)
f7e23f1c f7e23ee8 f8475800 03a2c940 00000000 00000000 445ce895 f7fb5cc4
dff6e648 dff6e540 b02b0480 b200c340 23033000 000f9663 00000000 b200cd94
b200cd8c b200cd8c 264ea2e8 00000282 b012996a 00000000 2e074ae8 00000000
Call Trace:
<b012996a> hrtimer_start+0xc5/0xd0 <b0270109> do_nanosleep+0x3b/0x63
<b01299b8> hrtimer_nanosleep+0x43/0xfe <b0120244> do_sigaction+0x92/0x149
<b0129596> hrtimer_wakeup+0x0/0x1c <b0129ab0> sys_nanosleep+0x3d/0x51
<b010267b> syscall_call+0x7/0xb
login S 00895440 0 3859 1 5750 3860 3843 (NOTLB)
df9abf68 f7f5f580 df9aa000 00895440 00000000 00000000 b18f9d4c 7ee02065
dfb81b58 dfb81a50 b02b0480 b200c340 40787740 000f95d1 00000000 f7880740
00000140 c647b080 b026f8ed df9abfb4 00000286 00000000 00895440 00000000
Call Trace:
<b026f8ed> schedule+0xa0d/0xaaf <b011a7cd> do_wait+0x8ab/0x94d
<b011357b> default_wake_function+0x0/0x15 <b011a896> sys_wait4+0x27/0x2a
<b010267b> syscall_call+0x7/0xb
getty S 000F4240 0 3860 1 3861 3859 (NOTLB)
df975edc 00000246 00000000 000f4240 00000000 00000000 00000020 b01fb342
dfb7bb98 dfb7ba90 dfb81a50 b200c340 85ce1000 000f41ff 00000000 00000000
000000ff b1fdc520 b02b5ba0 b02b5ba0 b01393de df975ec8 00000001 00000202
Call Trace:
<b01fb342> do_con_write+0x141f/0x1450 <b01393de> release_pages+0x147/0x150
<b01179cd> release_console_sem+0x157/0x191 <b026fd30> schedule_timeout+0x13/0x8b
<b01fb3ba> con_write+0x1d/0x23 <b01273fc> add_wait_queue+0x12/0x30
<b01f2383> read_chan+0x321/0x53e <b011357b> default_wake_function+0x0/0x15
<b01ee431> tty_read+0x5d/0x9f <b014bc1d> vfs_read+0xa3/0x13a
<b014c5cc> sys_read+0x3b/0x64 <b010267b> syscall_call+0x7/0xb
getty S 000F4240 0 3861 1 3862 3860 (NOTLB)
df9a7edc 00000246 00000000 000f4240 00000000 00000000 00000020 b01fb342
dfbc7138 dfbc7030 b02b0480 b200c340 861a5b40 000f41ff 00000000 00000000
000000ff b1fdc520 b02b5ba0 b02b5ba0 b01393de df9a7ec8 00000001 00000000
Call Trace:
<b01fb342> do_con_write+0x141f/0x1450 <b01393de> release_pages+0x147/0x150
<b026fd30> schedule_timeout+0x13/0x8b <b01fb3ba> con_write+0x1d/0x23
<b01273fc> add_wait_queue+0x12/0x30 <b01f2383> read_chan+0x321/0x53e
<b011357b> default_wake_function+0x0/0x15 <b01ee431> tty_read+0x5d/0x9f
<b014bc1d> vfs_read+0xa3/0x13a <b014c5cc> sys_read+0x3b/0x64
<b010267b> syscall_call+0x7/0xb
getty S 000F4240 0 3862 1 3863 3861 (NOTLB)
df99bedc 00000246 00000000 000f4240 00000000 00000000 861a5b40 000f41ff
dfbba138 dfbba030 dfbc7030 b200c340 861a5b40 000f41ff 00000000 00000000
000000ff b1fdc520 b02b5ba0 b02b5ba0 b200c380 df99bec8 00000001 00000206
Call Trace:
<b01179cd> release_console_sem+0x157/0x191 <b026fd30> schedule_timeout+0x13/0x8b
<b01fb3ba> con_write+0x1d/0x23 <b01273fc> add_wait_queue+0x12/0x30
<b01f2383> read_chan+0x321/0x53e <b011357b> default_wake_function+0x0/0x15
<b01ee431> tty_read+0x5d/0x9f <b014bc1d> vfs_read+0xa3/0x13a
<b014c5cc> sys_read+0x3b/0x64 <b010267b> syscall_call+0x7/0xb
getty S 002DC6C0 0 3863 1 3864 3862 (NOTLB)
f7ed3edc 00000246 00000000 002dc6c0 00000000 00000000 00000020 b01fb342
dffafb98 dffafa90 b21a0540 b2014340 85fbd6c0 000f41ff 00000001 00000000
000000ff b1fb66e0 b02b5ba0 b02b5ba0 b01393de f7ed3ec8 00000001 00000000
Call Trace:
<b01fb342> do_con_write+0x141f/0x1450 <b01393de> release_pages+0x147/0x150
<b026fd30> schedule_timeout+0x13/0x8b <b01fb3ba> con_write+0x1d/0x23
<b01273fc> add_wait_queue+0x12/0x30 <b01f2383> read_chan+0x321/0x53e
<b011357b> default_wake_function+0x0/0x15 <b01ee431> tty_read+0x5d/0x9f
<b014bc1d> vfs_read+0xa3/0x13a <b014c5cc> sys_read+0x3b/0x64
<b010267b> syscall_call+0x7/0xb
getty S 000F4240 0 3864 1 5964 3863 (NOTLB)
df9ededc 00000246 00000000 000f4240 00000000 00000000 00000020 b01fb342
dfb75138 dfb75030 b21a0540 b2014340 8638dfc0 000f41ff 00000001 00000000
000000ff b1fb66e0 b02b5ba0 b02b5ba0 b01393de df9edec8 00000001 00000000
Call Trace:
<b01fb342> do_con_write+0x141f/0x1450 <b01393de> release_pages+0x147/0x150
<b026fd30> schedule_timeout+0x13/0x8b <b01fb3ba> con_write+0x1d/0x23
<b01273fc> add_wait_queue+0x12/0x30 <b01f2383> read_chan+0x321/0x53e
<b011357b> default_wake_function+0x0/0x15 <b01ee431> tty_read+0x5d/0x9f
<b014bc1d> vfs_read+0xa3/0x13a <b014c5cc> sys_read+0x3b/0x64
<b010267b> syscall_call+0x7/0xb
sshd S 08B3C880 0 3875 3778 3877 3887 (NOTLB)
df99fb60 008a184e 0000012b 08b3c880 00000000 00000000 97a171c0 000f4a7c
b21dfb58 b21dfa50 dffbfa50 b200c340 3f0f6200 000f4a7d 00000000 00000000
df99fb54 b0111803 00000000 00000000 00000003 00000046 f797100c 00000286
Call Trace:
<b0111803> __wake_up+0x29/0x3c <b026fd30> schedule_timeout+0x13/0x8b
<b01ee160> tty_poll+0x4a/0x54 <b015baca> do_select+0x3b6/0x41e
<b021d346> neigh_lookup+0x9c/0xa3 <b015b65c> __pollwait+0x0/0xb8
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b0226a72> qdisc_restart+0x14/0x17f <b021aff9> dev_queue_xmit+0x1e5/0x1eb
<b02377ea> ip_output+0x1c0/0x1f6 <b023717e> ip_queue_xmit+0x34d/0x38f
<b01114b5> __activate_task+0x47/0x56 <b011e999> lock_timer_base+0x15/0x2f
<b011f2c3> __mod_timer+0x90/0x98 <b0213117> sk_reset_timer+0x12/0x1e
<b0243dea> update_send_head+0x7a/0x80 <b02468c8> __tcp_push_pending_frames+0x68f/0x74b
<b0215457> __alloc_skb+0x4a/0xee <b02133db> release_sock+0xf/0x97
<b023d9ae> tcp_sendmsg+0x8ab/0x985 <b015bd1a> core_sys_select+0x1e8/0x2a2
<b0211077> do_sock_write+0xa4/0xad <b0211625> sock_aio_write+0x56/0x63
<b014b99a> do_sync_write+0xc0/0xf3 <b01271dc> autoremove_wake_function+0x0/0x3a
<b015c0ca> sys_select+0x9a/0x166 <b014c630> sys_write+0x3b/0x64
<b010267b> syscall_call+0x7/0xb
bash S 02CD29C0 0 3877 3875 (NOTLB)
df8afedc b01ecee7 00000000 02cd29c0 00000000 00000000 df8b780c df8b7800
dffbfb58 dffbfa50 b02b0480 b200c340 3f0f6200 000f4a7d 00000000 00000010
00000000 df8aff14 dffbfa50 df8affbc b01209c2 f7814ac0 f7814c18 00000000
Call Trace:
<b01ecee7> tty_ldisc_deref+0x52/0x61 <b01209c2> dequeue_signal+0x36/0xa4
<b026fd30> schedule_timeout+0x13/0x8b <b01273fc> add_wait_queue+0x12/0x30
<b01f2383> read_chan+0x321/0x53e <b011357b> default_wake_function+0x0/0x15
<b01ee431> tty_read+0x5d/0x9f <b014bc1d> vfs_read+0xa3/0x13a
<b014c5cc> sys_read+0x3b/0x64 <b010267b> syscall_call+0x7/0xb
sshd S 1BA81400 0 3887 3778 3889 5760 3875 (NOTLB)
df8b1b60 df8b1b58 df8b1b58 1ba81400 00000000 00000000 474e1feb 000f4a7e
dffbb648 dffbb540 b02b0480 b200c340 4758da80 000f4a7e 00000000 00000001
df8b1b54 b0111803 00000000 00000000 00000003 00000046 f780480c 00000000
Call Trace:
<b0111803> __wake_up+0x29/0x3c <b026fd30> schedule_timeout+0x13/0x8b
<b01ee160> tty_poll+0x4a/0x54 <b015baca> do_select+0x3b6/0x41e
<b024a28e> tcp_v4_rcv+0x3ed/0x8dd <b015b65c> __pollwait+0x0/0xb8
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b0226a72> qdisc_restart+0x14/0x17f <b021aff9> dev_queue_xmit+0x1e5/0x1eb
<b02377ea> ip_output+0x1c0/0x1f6 <b023717e> ip_queue_xmit+0x34d/0x38f
<b021aff9> dev_queue_xmit+0x1e5/0x1eb <b02377ea> ip_output+0x1c0/0x1f6
<b023717e> ip_queue_xmit+0x34d/0x38f <b01055a9> timer_interrupt+0x72/0x7a
<b011e999> lock_timer_base+0x15/0x2f <b011f2c3> __mod_timer+0x90/0x98
<b0213117> sk_reset_timer+0x12/0x1e <b0243dea> update_send_head+0x7a/0x80
<b02468c8> __tcp_push_pending_frames+0x68f/0x74b <b0104a5b> do_IRQ+0x22/0x2b
<b0215457> __alloc_skb+0x4a/0xee <b02133db> release_sock+0xf/0x97
<b023d9ae> tcp_sendmsg+0x8ab/0x985 <b015bd1a> core_sys_select+0x1e8/0x2a2
<b0211077> do_sock_write+0xa4/0xad <b0211625> sock_aio_write+0x56/0x63
<b014b99a> do_sync_write+0xc0/0xf3 <b01271dc> autoremove_wake_function+0x0/0x3a
<b015c0ca> sys_select+0x9a/0x166 <b014c630> sys_write+0x3b/0x64
<b010267b> syscall_call+0x7/0xb
bash S 02CD29C0 0 3889 3887 (NOTLB)
df9a3edc b01ecee7 00000000 02cd29c0 00000000 00000000 4758da80 000f4a7e
dff18b98 dff18a90 dffbb540 b200c340 4758da80 000f4a7e 00000000 00000000
00000016 0202c205 00000002 1b02000a b200c380 f792d280 f7f8c634 f7f87280
Call Trace:
<b01ecee7> tty_ldisc_deref+0x52/0x61 <b026fd30> schedule_timeout+0x13/0x8b
<b01273fc> add_wait_queue+0x12/0x30 <b01f2383> read_chan+0x321/0x53e
<b011357b> default_wake_function+0x0/0x15 <b01ee431> tty_read+0x5d/0x9f
<b014bc1d> vfs_read+0xa3/0x13a <b014c5cc> sys_read+0x3b/0x64
<b010267b> syscall_call+0x7/0xb
bash S 01312D00 0 5750 3859 (NOTLB)
b85cfedc 00000246 00000000 01312d00 00000000 000186a0 00000020 b01fb342
dfe73138 dfe73030 b02b0480 b200c340 05584d80 000f966b 00000000 00000000
000000ff 00000297 b01ba027 000003d4 00000297 b201e000 df808800 00000000
Call Trace:
<b01fb342> do_con_write+0x141f/0x1450 <b01ba027> vgacon_set_cursor_size+0x34/0xd1
<b026fd30> schedule_timeout+0x13/0x8b <b01179cd> release_console_sem+0x157/0x191
<b01273fc> add_wait_queue+0x12/0x30 <b01f2383> read_chan+0x321/0x53e
<b011357b> default_wake_function+0x0/0x15 <b01ee431> tty_read+0x5d/0x9f
<b014bc1d> vfs_read+0xa3/0x13a <b014c5cc> sys_read+0x3b/0x64
<b010267b> syscall_call+0x7/0xb
sshd S 0F518240 0 5760 3778 5762 5787 3887 (NOTLB)
effd1b60 058431d4 00000127 0f518240 00000000 00000000 00000000 b011bdf9
dfbba648 dfbba540 b02b0480 b200c340 61511d00 000f966c 00000000 00000001
effd1b54 b0111803 00000000 00000000 00000003 00000046 f7d6200c 00000000
Call Trace:
<b011bdf9> __do_softirq+0x57/0xc0 <b0111803> __wake_up+0x29/0x3c
<b026fd30> schedule_timeout+0x13/0x8b <b01ee160> tty_poll+0x4a/0x54
<b015baca> do_select+0x3b6/0x41e <b015b65c> __pollwait+0x0/0xb8
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b0226a72> qdisc_restart+0x14/0x17f <b021aff9> dev_queue_xmit+0x1e5/0x1eb
<b02377ea> ip_output+0x1c0/0x1f6 <b023717e> ip_queue_xmit+0x34d/0x38f
<b021b373> netif_receive_skb+0x22c/0x24a <f89008a0> tg3_poll+0x5e8/0x643 [tg3]
<b011e999> lock_timer_base+0x15/0x2f <b011f2c3> __mod_timer+0x90/0x98
<b0213117> sk_reset_timer+0x12/0x1e <b0243dea> update_send_head+0x7a/0x80
<b02468c8> __tcp_push_pending_frames+0x68f/0x74b <b0104a5b> do_IRQ+0x22/0x2b
<b0215457> __alloc_skb+0x4a/0xee <b02133db> release_sock+0xf/0x97
<b023d9ae> tcp_sendmsg+0x8ab/0x985 <b015bd1a> core_sys_select+0x1e8/0x2a2
<b0211077> do_sock_write+0xa4/0xad <b0211625> sock_aio_write+0x56/0x63
<b014b99a> do_sync_write+0xc0/0xf3 <b01271dc> autoremove_wake_function+0x0/0x3a
<b015c0ca> sys_select+0x9a/0x166 <b014c630> sys_write+0x3b/0x64
<b010267b> syscall_call+0x7/0xb
bash S 068E7780 0 5762 5760 6115 (NOTLB)
d3b45f54 b013d869 b1d3cc80 068e7780 00000000 00000000 dfeb9740 b1d3cc80
dffbf138 dffbf030 dfbc7a50 b200c340 68f58540 000f9654 00000000 00000000
bfdb9080 b134470c 69e64065 f7d6f574 0000001d 000003d8 f789ee00 aff805a4
Call Trace:
<b013d869> do_wp_page+0x268/0x286 <b011a7cd> do_wait+0x8ab/0x94d
<b011357b> default_wake_function+0x0/0x15 <b011a896> sys_wait4+0x27/0x2a
<b011a8ac> sys_waitpid+0x13/0x17 <b010267b> syscall_call+0x7/0xb
sshd S 08C30AC0 0 5787 3778 5789 5760 (NOTLB)
b5dbfb60 f792d2e0 000005ea 08c30ac0 00000000 00000000 a4ab0bc0 000f966b
dfbc2648 dfbc2540 b02b0480 b200c340 a5ccf680 000f966b 00000000 00000001
b5dbfb54 b0111803 00000000 00000000 00000003 00000046 dcd5400c 00000000
Call Trace:
<b0111803> __wake_up+0x29/0x3c <b026fd30> schedule_timeout+0x13/0x8b
<b01ee160> tty_poll+0x4a/0x54 <b015baca> do_select+0x3b6/0x41e
<b015b65c> __pollwait+0x0/0xb8 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b011357b> default_wake_function+0x0/0x15
<b011357b> default_wake_function+0x0/0x15 <b0226a72> qdisc_restart+0x14/0x17f
<b021aff9> dev_queue_xmit+0x1e5/0x1eb <b02377ea> ip_output+0x1c0/0x1f6
<b023717e> ip_queue_xmit+0x34d/0x38f <b011bdf9> __do_softirq+0x57/0xc0
<b010314c> apic_timer_interrupt+0x1c/0x24 <b0244573> tcp_transmit_skb+0x5dc/0x600
<b02468c8> __tcp_push_pending_frames+0x68f/0x74b <b0215457> __alloc_skb+0x4a/0xee
<b02133db> release_sock+0xf/0x97 <b023d9ae> tcp_sendmsg+0x8ab/0x985
<b015bd1a> core_sys_select+0x1e8/0x2a2 <b0211077> do_sock_write+0xa4/0xad
<b0211625> sock_aio_write+0x56/0x63 <b014b99a> do_sync_write+0xc0/0xf3
<b01271dc> autoremove_wake_function+0x0/0x3a <b015c0ca> sys_select+0x9a/0x166
<b014c630> sys_write+0x3b/0x64 <b010267b> syscall_call+0x7/0xb
bash S 055D4A80 0 5789 5787 (NOTLB)
de097edc b01ecee7 00000000 055d4a80 00000000 0001b207 a5ccf680 000f966b
dfbc2b58 dfbc2a50 dfbc2540 b200c340 a5ccf680 000f966b 00000000 00000000
00000000 de097f14 dfbc2a50 de097fbc b200c380 f7f3f580 f7f3f6d8 f7f3f6c4
Call Trace:
<b01ecee7> tty_ldisc_deref+0x52/0x61 <b01216d0> get_signal_to_deliver+0x1c0/0x372
<b026fd30> schedule_timeout+0x13/0x8b <b01273fc> add_wait_queue+0x12/0x30
<b01f2383> read_chan+0x321/0x53e <b011357b> default_wake_function+0x0/0x15
<b01ee431> tty_read+0x5d/0x9f <b014bc1d> vfs_read+0xa3/0x13a
<b014c5cc> sys_read+0x3b/0x64 <b010267b> syscall_call+0x7/0xb
pdflush D 37E8BE80 0 5842 11 6085 3699 (L-TLB)
c7803eec b1bfa900 b1bfa920 37e8be80 00000000 00000000 fc4c7fab 000f9668
dfe67688 dfe67580 b21a0540 b2014340 805ef140 000f966c 00000001 00000000
00000286 b011e999 00000286 b011e999 c7803efc c7803efc 00000286 00000000
Call Trace:
<b011e999> lock_timer_base+0x15/0x2f <b011e999> lock_timer_base+0x15/0x2f
<b026fd89> schedule_timeout+0x6c/0x8b <b011ecff> process_timeout+0x0/0x9
<b026ebf5> io_schedule_timeout+0x29/0x33 <b01387eb> pdflush+0x0/0x1b5
<b01a0a1d> blk_congestion_wait+0x55/0x69 <b01271dc> autoremove_wake_function+0x0/0x3a
<b0137f01> background_writeout+0x7d/0x8b <b01388ee> pdflush+0x103/0x1b5
<b0137e84> background_writeout+0x0/0x8b <b01271a1> kthread+0xa3/0xd0
<b01270fe> kthread+0x0/0xd0 <b0100bc5> kernel_thread_helper+0x5/0xb
syslogd S 029F6300 0 5964 1 6051 3864 (NOTLB)
c8d39b60 f88df961 c2e1aed8 029f6300 00000000 00000000 f7c7c260 00001000
dffbd178 dffbd070 b02b0480 b200c340 baf35680 000f9666 00000000 00000000
b013595f dffe2340 00011200 b733848c 00000246 b01273fc df9d4680 00000000
Call Trace:
<f88df961> ext3_mark_iloc_dirty+0x2df/0x345 [ext3] <b013595f> mempool_alloc+0x21/0xbf
<b01273fc> add_wait_queue+0x12/0x30 <b026fd30> schedule_timeout+0x13/0x8b
<b02112c0> sock_poll+0x13/0x17 <b015baca> do_select+0x3b6/0x41e
<b015b65c> __pollwait+0x0/0xb8 <b011357b> default_wake_function+0x0/0x15
<f887355d> __journal_file_buffer+0x111/0x1e7 [jbd] <f8874995> do_get_write_access+0x44a/0x45c [jbd]
<f887355d> __journal_file_buffer+0x111/0x1e7 [jbd] <f8874df9> journal_dirty_metadata+0x173/0x19e [jbd]
<b013595f> mempool_alloc+0x21/0xbf <b013595f> mempool_alloc+0x21/0xbf
<b01a543b> as_set_request+0x1c/0x6d <b01a5a45> as_update_iohist+0x38/0x2a5
<b01a63ee> as_add_request+0xc0/0xef <b019ea34> elv_insert+0x9b/0x138
<b01a27e6> __make_request+0x319/0x350 <b01342a3> generic_file_buffered_write+0x482/0x56f
<b01a0268> generic_make_request+0x168/0x17a <b013595f> mempool_alloc+0x21/0xbf
<b01a0ad6> submit_bio+0xa5/0xaa <b015bd1a> core_sys_select+0x1e8/0x2a2
<b0133749> wait_on_page_writeback_range+0xb3/0xf7 <b011f676> __sigqueue_free+0x2a/0x2f
<b0120981> __dequeue_signal+0x14b/0x156 <b01053b7> do_gettimeofday+0x1b/0x9d
<b01053b7> do_gettimeofday+0x1b/0x9d <b011b6ba> getnstimeofday+0xf/0x25
<b01acb36> rb_insert_color+0xa6/0xc8 <b01294f4> enqueue_hrtimer+0x58/0x7f
<b012996a> hrtimer_start+0xc5/0xd0 <b011abdd> do_setitimer+0x16c/0x47d
<b015c0ca> sys_select+0x9a/0x166 <b0101fc0> sys_sigreturn+0x98/0xbd
<b010267b> syscall_call+0x7/0xb
klogd R running 0 6051 1 5964 (NOTLB)
pdflush D 0E205540 0 6085 11 6130 5842 (L-TLB)
b4d01eec f792d280 00000204 0e205540 00000000 00000000 ad7d58c0 000f966b
dffbbb58 dffbba50 b21a0540 b2014340 7e1b1bc0 000f966c 00000001 00000000
00000286 b011e999 00000286 b011e999 b4d01efc b4d01efc 00000286 00000000
Call Trace:
<b011e999> lock_timer_base+0x15/0x2f <b011e999> lock_timer_base+0x15/0x2f
<b026fd89> schedule_timeout+0x6c/0x8b <b011ecff> process_timeout+0x0/0x9
<b026ebf5> io_schedule_timeout+0x29/0x33 <b01387eb> pdflush+0x0/0x1b5
<b01a0a1d> blk_congestion_wait+0x55/0x69 <b01271dc> autoremove_wake_function+0x0/0x3a
<b0137f01> background_writeout+0x7d/0x8b <b01388ee> pdflush+0x103/0x1b5
<b0137e84> background_writeout+0x0/0x8b <b01271a1> kthread+0xa3/0xd0
<b01270fe> kthread+0x0/0xd0 <b0100bc5> kernel_thread_helper+0x5/0xb
ncftpget D 568FC100 0 6115 5762 (NOTLB)
cb351a88 87280008 b013595f 568fc100 00000008 00051615 f2a64c80 b21cc340
dfbc7b58 dfbc7a50 b02b0480 b200c340 77405900 000f966c 00000000 00000282
b011e999 f2a64cd8 00000000 dfec8b84 00000000 dfec8b84 b01a216b 00000000
Call Trace:
<b013595f> mempool_alloc+0x21/0xbf <b011e999> lock_timer_base+0x15/0x2f
<b01a216b> get_request+0x55/0x283 <b026f9b5> io_schedule+0x26/0x30
<b01a2432> get_request_wait+0x99/0xd3 <b01271dc> autoremove_wake_function+0x0/0x3a
<b01a2786> __make_request+0x2b9/0x350 <b01a0268> generic_make_request+0x168/0x17a
<b013595f> mempool_alloc+0x21/0xbf <b01a0ad6> submit_bio+0xa5/0xaa
<b015032c> bio_alloc+0x13/0x22 <b014dad0> submit_bh+0xe6/0x107
<b014f9d9> __block_write_full_page+0x20e/0x301 <f88e17e8> ext3_get_block+0x0/0xad [ext3]
<f88e0633> ext3_ordered_writepage+0xcb/0x137 [ext3] <f88e17e8> ext3_get_block+0x0/0xad [ext3]
<f88def99> bget_one+0x0/0xb [ext3] <b016a1d3> mpage_writepages+0x193/0x2e9
<f88e0568> ext3_ordered_writepage+0x0/0x137 [ext3] <b013813f> do_writepages+0x30/0x39
<b0168988> __writeback_single_inode+0x166/0x2e2 <b01aa9d3> __next_cpu+0x11/0x20
<b0136421> read_page_state_offset+0x33/0x41 <b0168f5e> sync_sb_inodes+0x185/0x23a
<b01691c6> writeback_inodes+0x6e/0xbb <b0138246> balance_dirty_pages_ratelimited_nr+0xcb/0x152
<b013429e> generic_file_buffered_write+0x47d/0x56f <f88e698a> __ext3_journal_stop+0x19/0x37 [ext3]
<f88dfde0> ext3_dirty_inode+0x5e/0x64 [ext3] <b0168b4c> __mark_inode_dirty+0x28/0x14c
<b01354da> __generic_file_aio_write_nolock+0x3c8/0x405 <b0211c91> sock_aio_read+0x56/0x63
<b013573c> generic_file_aio_write+0x61/0xb3 <f88dde72> ext3_file_write+0x26/0x92 [ext3]
<b014b99a> do_sync_write+0xc0/0xf3 <b016201a> notify_change+0x2d4/0x2e5
<b01271dc> autoremove_wake_function+0x0/0x3a <b014bdb0> vfs_write+0xa3/0x13a
<b014c630> sys_write+0x3b/0x64 <b010267b> syscall_call+0x7/0xb
pdflush D 00F42400 0 6130 11 6146 6085 (L-TLB)
f57ddeec b1964a00 b1fdf940 00f42400 00000000 00000000 b1c75ca0 b1f29320
dfb75648 dfb75540 b21a0540 b2014340 7b01a6c0 000f966c 00000001 00000000
00000286 b011e999 00000286 b011e999 f57ddefc f57ddefc 00000286 00000000
Call Trace:
<b011e999> lock_timer_base+0x15/0x2f <b011e999> lock_timer_base+0x15/0x2f
<b026fd89> schedule_timeout+0x6c/0x8b <b011ecff> process_timeout+0x0/0x9
<b026ebf5> io_schedule_timeout+0x29/0x33 <b01387eb> pdflush+0x0/0x1b5
<b01a0a1d> blk_congestion_wait+0x55/0x69 <b01271dc> autoremove_wake_function+0x0/0x3a
<b0137f01> background_writeout+0x7d/0x8b <b01388ee> pdflush+0x103/0x1b5
<b0137e84> background_writeout+0x0/0x8b <b01271a1> kthread+0xa3/0xd0
<b01270fe> kthread+0x0/0xd0 <b0100bc5> kernel_thread_helper+0x5/0xb
pdflush D 004C4B40 0 6146 11 6156 6130 (L-TLB)
dec91eec 00000001 b02f20e0 004c4b40 00000000 00000000 00000000 b02f20e0
dfb80178 dfb80070 b21a0540 b2014340 7b8afb00 000f966c 00000001 00000000
00000286 b011e999 00000286 b011e999 dec91efc dec91efc 00000286 00000000
Call Trace:
<b011e999> lock_timer_base+0x15/0x2f <b011e999> lock_timer_base+0x15/0x2f
<b026fd89> schedule_timeout+0x6c/0x8b <b011ecff> process_timeout+0x0/0x9
<b026ebf5> io_schedule_timeout+0x29/0x33 <b01387eb> pdflush+0x0/0x1b5
<b01a0a1d> blk_congestion_wait+0x55/0x69 <b01271dc> autoremove_wake_function+0x0/0x3a
<b0137f01> background_writeout+0x7d/0x8b <b01388ee> pdflush+0x103/0x1b5
<b0137e84> background_writeout+0x0/0x8b <b01271a1> kthread+0xa3/0xd0
<b01270fe> kthread+0x0/0xd0 <b0100bc5> kernel_thread_helper+0x5/0xb
pdflush D 04B571C0 0 6156 11 6168 6146 (L-TLB)
d959feec b1b7c500 b1f0e020 04b571c0 00000000 00000000 14c754c0 000f9669
dff15b98 dff15a90 b21a0540 b2014340 7b10e900 000f966c 00000001 00000000
00000286 b011e999 00000286 b011e999 d959fefc d959fefc 00000286 00000000
Call Trace:
<b011e999> lock_timer_base+0x15/0x2f <b011e999> lock_timer_base+0x15/0x2f
<b026fd89> schedule_timeout+0x6c/0x8b <b011ecff> process_timeout+0x0/0x9
<b026ebf5> io_schedule_timeout+0x29/0x33 <b01387eb> pdflush+0x0/0x1b5
<b01a0a1d> blk_congestion_wait+0x55/0x69 <b01271dc> autoremove_wake_function+0x0/0x3a
<b0137f01> background_writeout+0x7d/0x8b <b01388ee> pdflush+0x103/0x1b5
<b0137e84> background_writeout+0x0/0x8b <b01271a1> kthread+0xa3/0xd0
<b01270fe> kthread+0x0/0xd0 <b0100bc5> kernel_thread_helper+0x5/0xb
pdflush D 000F4240 0 6168 11 6156 (L-TLB)
caa13eec 0000000a b23ec9cc 000f4240 00000000 00000000 b01117b5 ebb31f74
b21dd178 b21dd070 b21a0540 b2014340 7ee17900 000f966c 00000001 00000000
00000286 b011e999 00000286 b011e999 caa13efc caa13efc 00000286 00000000
Call Trace:
<b01117b5> __wake_up_common+0x2a/0x4f <b011e999> lock_timer_base+0x15/0x2f
<b011e999> lock_timer_base+0x15/0x2f <b026fd89> schedule_timeout+0x6c/0x8b
<b011ecff> process_timeout+0x0/0x9 <b026ebf5> io_schedule_timeout+0x29/0x33
<b01387eb> pdflush+0x0/0x1b5 <b01a0a1d> blk_congestion_wait+0x55/0x69
<b01271dc> autoremove_wake_function+0x0/0x3a <b0137f01> background_writeout+0x7d/0x8b
<b01388ee> pdflush+0x103/0x1b5 <b0137e84> background_writeout+0x0/0x8b
<b01271a1> kthread+0xa3/0xd0 <b01270fe> kthread+0x0/0xd0
<b0100bc5> kernel_thread_helper+0x5/0xb
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-06 18:23 ` Jason Schoonover
@ 2006-05-06 20:01 ` Robert Hancock
0 siblings, 0 replies; 40+ messages in thread
From: Robert Hancock @ 2006-05-06 20:01 UTC (permalink / raw)
To: Jason Schoonover; +Cc: linux-kernel
Jason Schoonover wrote:
> Hi Robert,
>
> I did started a ncftpget and managed to get 6 pdflush processes running in
> state D, hopefully this will give us a chance to debug it.
>
> I've attached the entire Alt+SysReq+T output here because I have no idea how
> to read it.
Well, I think the relevant parts would be:
> pdflush D 37E8BE80 0 5842 11 6085 3699 (L-TLB)
> c7803eec b1bfa900 b1bfa920 37e8be80 00000000 00000000 fc4c7fab 000f9668
> dfe67688 dfe67580 b21a0540 b2014340 805ef140 000f966c 00000001 00000000
> 00000286 b011e999 00000286 b011e999 c7803efc c7803efc 00000286 00000000
> Call Trace:
> <b011e999> lock_timer_base+0x15/0x2f <b011e999> lock_timer_base+0x15/0x2f
> <b026fd89> schedule_timeout+0x6c/0x8b <b011ecff> process_timeout+0x0/0x9
> <b026ebf5> io_schedule_timeout+0x29/0x33 <b01387eb> pdflush+0x0/0x1b5
> <b01a0a1d> blk_congestion_wait+0x55/0x69 <b01271dc> autoremove_wake_function+0x0/0x3a
> <b0137f01> background_writeout+0x7d/0x8b <b01388ee> pdflush+0x103/0x1b5
> <b0137e84> background_writeout+0x0/0x8b <b01271a1> kthread+0xa3/0xd0
> <b01270fe> kthread+0x0/0xd0 <b0100bc5> kernel_thread_helper+0x5/0xb
> ncftpget D 568FC100 0 6115 5762 (NOTLB)
> cb351a88 87280008 b013595f 568fc100 00000008 00051615 f2a64c80 b21cc340
> dfbc7b58 dfbc7a50 b02b0480 b200c340 77405900 000f966c 00000000 00000282
> b011e999 f2a64cd8 00000000 dfec8b84 00000000 dfec8b84 b01a216b 00000000
> Call Trace:
> <b013595f> mempool_alloc+0x21/0xbf <b011e999> lock_timer_base+0x15/0x2f
> <b01a216b> get_request+0x55/0x283 <b026f9b5> io_schedule+0x26/0x30
> <b01a2432> get_request_wait+0x99/0xd3 <b01271dc> autoremove_wake_function+0x0/0x3a
> <b01a2786> __make_request+0x2b9/0x350 <b01a0268> generic_make_request+0x168/0x17a
> <b013595f> mempool_alloc+0x21/0xbf <b01a0ad6> submit_bio+0xa5/0xaa
> <b015032c> bio_alloc+0x13/0x22 <b014dad0> submit_bh+0xe6/0x107
> <b014f9d9> __block_write_full_page+0x20e/0x301 <f88e17e8> ext3_get_block+0x0/0xad [ext3]
> <f88e0633> ext3_ordered_writepage+0xcb/0x137 [ext3] <f88e17e8> ext3_get_block+0x0/0xad [ext3]
> <f88def99> bget_one+0x0/0xb [ext3] <b016a1d3> mpage_writepages+0x193/0x2e9
> <f88e0568> ext3_ordered_writepage+0x0/0x137 [ext3] <b013813f> do_writepages+0x30/0x39
> <b0168988> __writeback_single_inode+0x166/0x2e2 <b01aa9d3> __next_cpu+0x11/0x20
> <b0136421> read_page_state_offset+0x33/0x41 <b0168f5e> sync_sb_inodes+0x185/0x23a
> <b01691c6> writeback_inodes+0x6e/0xbb <b0138246> balance_dirty_pages_ratelimited_nr+0xcb/0x152
> <b013429e> generic_file_buffered_write+0x47d/0x56f <f88e698a> __ext3_journal_stop+0x19/0x37 [ext3]
> <f88dfde0> ext3_dirty_inode+0x5e/0x64 [ext3] <b0168b4c> __mark_inode_dirty+0x28/0x14c
> <b01354da> __generic_file_aio_write_nolock+0x3c8/0x405 <b0211c91> sock_aio_read+0x56/0x63
> <b013573c> generic_file_aio_write+0x61/0xb3 <f88dde72> ext3_file_write+0x26/0x92 [ext3]
> <b014b99a> do_sync_write+0xc0/0xf3 <b016201a> notify_change+0x2d4/0x2e5
> <b01271dc> autoremove_wake_function+0x0/0x3a <b014bdb0> vfs_write+0xa3/0x13a
> <b014c630> sys_write+0x3b/0x64 <b010267b> syscall_call+0x7/0xb
It looks like the pdflush threads are sitting in uninterruptible sleep
waiting for a block queue to become uncongested. This seems somewhat
reasonable to me in this situation, but someone more familiar with the
block layer would likely have to comment on whether this is the expected
behavior..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-05 17:10 Jason Schoonover
@ 2006-05-06 23:03 ` bert hubert
2006-05-07 1:02 ` Jason Schoonover
2006-05-07 16:50 ` Andrew Morton
1 sibling, 1 reply; 40+ messages in thread
From: bert hubert @ 2006-05-06 23:03 UTC (permalink / raw)
To: Jason Schoonover; +Cc: linux-kernel
On Fri, May 05, 2006 at 10:10:19AM -0700, Jason Schoonover wrote:
> Whenever I copy any large file (over 500GB) the load average starts to slowly
> rise and after about a minute it is up to 7.5 and keeps on rising (depending
> on how long the file takes to copy). When I watch top, the processes at the
> top of the list are cp, pdflush, kjournald and kswapd.
Load average is a bit of an odd metric in this case, try looking at the
output from 'vmstat 1', and especially the 'id' column. As long as that
doesn't rise, you don't have an actual problem.
The number of processes in the runqueue doesn't really tell you anything
about how much CPU you are using.
Having said that, I think there might be a problem to be solved.
Bert
--
http://www.PowerDNS.com Open source, database driven DNS Software
http://netherlabs.nl Open and Closed source services
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-06 23:03 ` bert hubert
@ 2006-05-07 1:02 ` Jason Schoonover
2006-05-07 10:54 ` bert hubert
0 siblings, 1 reply; 40+ messages in thread
From: Jason Schoonover @ 2006-05-07 1:02 UTC (permalink / raw)
To: bert hubert; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1527 bytes --]
Hi Bert,
That's interesting, I didn't know that about the vmstat command, I will
definitely use that more in the future. I went ahead and did an ncftpget and
started vmstat 1 at the same time, I've attached the output here. It looks
like the 'id' column was at 100, and then as soon as I started ncftpget, it
went down to 0 the whole time.
The interesting thing was that after I did a Ctrl-C on the ncftpget, the id
column was still at 0, even though the ncftpget process was over. The id
column was at 0 and the 'wa' column was at 98, up until all of the pdflush
processes ended.
Is that the expected behavior?
Jason
-------Original Message-----
From: bert hubert
Sent: Saturday 06 May 2006 16:03
To: Jason Schoonover
Subject: Re: High load average on disk I/O on 2.6.17-rc3
On Fri, May 05, 2006 at 10:10:19AM -0700, Jason Schoonover wrote:
> Whenever I copy any large file (over 500GB) the load average starts to
> slowly rise and after about a minute it is up to 7.5 and keeps on rising
> (depending on how long the file takes to copy). When I watch top, the
> processes at the top of the list are cp, pdflush, kjournald and kswapd.
Load average is a bit of an odd metric in this case, try looking at the
output from 'vmstat 1', and especially the 'id' column. As long as that
doesn't rise, you don't have an actual problem.
The number of processes in the runqueue doesn't really tell you anything
about how much CPU you are using.
Having said that, I think there might be a problem to be solved.
Bert
[-- Attachment #2: vmstat-1.txt --]
[-- Type: text/plain, Size: 3476 bytes --]
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 40 1991860 9444 14900 0 0 57 136 148 8 0 0 98 1
0 0 40 1991860 9452 14892 0 0 0 372 1048 24 0 0 100 0
0 0 40 1991860 9464 14880 0 0 4 36 1051 75 0 0 100 1
0 0 40 1991860 9464 14880 0 0 0 0 1032 52 0 0 100 0
0 0 40 1991860 9464 14880 0 0 0 0 1033 42 0 0 100 0
1 0 40 1971532 9536 35480 0 0 236 0 4944 401 1 10 87 4
1 0 40 1891056 9620 114616 0 0 4 176 17382 276 1 35 65 0
1 1 40 1805372 9712 198232 0 0 4 48784 18660 285 1 38 52 9
1 3 40 1723160 9792 279208 0 0 0 70868 18481 318 1 39 19 43
0 4 40 1704684 9812 297344 0 0 4 16480 4864 127 0 8 0 92
1 3 40 1657068 9860 344148 0 0 0 18040 10767 245 0 21 0 78
0 2 40 1581072 9940 420636 0 0 0 2436 16886 332 1 40 17 43
1 4 40 1521800 10000 477152 0 0 4 34660 12934 306 0 30 12 58
1 4 40 1441696 10076 554936 0 0 0 26912 17277 338 0 35 0 64
0 4 40 1399660 10120 595760 0 0 4 31944 9725 251 0 19 0 81
1 4 40 1326128 10192 667292 0 0 0 34668 15984 339 1 33 0 67
1 4 40 1273056 10244 718648 0 0 4 35376 11855 266 0 24 0 76
1 4 40 1190968 10324 798468 0 0 0 10228 17557 357 1 35 5 59
1 4 40 1112724 10396 874624 0 0 0 27844 16982 376 1 34 5 61
0 5 40 1044168 10468 941668 0 0 4 32492 15184 346 0 30 2 67
1 4 40 997668 10512 986912 0 0 0 22632 10538 244 1 22 0 78
0 4 40 926120 10576 1056344 0 0 0 13724 15436 332 1 34 0 65
1 5 40 889656 10628 1094712 0 0 4 28000 9218 285 0 22 5 74
0 6 40 859028 10660 1124600 0 0 0 13768 7295 196 0 15 0 84
0 7 40 833856 10684 1148852 0 0 0 12324 6169 165 0 12 0 88
Did a Ctrl-C here
0 2 40 47312 7888 1920116 0 0 0 36452 1353 52 0 2 39 60
0 2 40 47816 7888 1920116 0 0 0 36264 1354 56 0 1 0 99
0 2 40 48312 7888 1920116 0 0 0 36248 1362 52 0 1 0 99
0 2 40 48808 7888 1920116 0 0 0 36040 1362 56 0 1 0 99
0 2 40 49180 7888 1920116 0 0 0 36592 1360 54 0 2 0 99
0 2 40 49552 7888 1920116 0 0 0 36064 1355 56 0 2 0 98
0 2 40 50048 7888 1920116 0 0 0 36576 1360 54 0 2 0 99
0 2 40 50544 7888 1920116 0 0 0 36340 1351 57 0 1 0 99
0 2 40 51040 7888 1920116 0 0 0 36164 1352 51 0 2 0 99
0 2 40 51412 7888 1920116 0 0 0 36044 1367 57 0 1 0 99
0 2 40 51908 7888 1920116 0 0 0 36204 1359 51 0 1 0 99
0 2 40 52404 7888 1920116 0 0 0 36012 1363 55 0 2 0 99
0 2 40 52776 7888 1920116 0 0 0 36280 1362 68 0 1 0 99
0 2 40 53140 7888 1920116 0 0 0 36236 1395 251 0 2 0 98
0 2 40 53760 7888 1920116 0 0 0 36316 1359 53 0 2 0 99
0 2 40 54504 7888 1920116 0 0 0 36468 1356 55 0 2 0 98
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-07 1:02 ` Jason Schoonover
@ 2006-05-07 10:54 ` bert hubert
0 siblings, 0 replies; 40+ messages in thread
From: bert hubert @ 2006-05-07 10:54 UTC (permalink / raw)
To: Jason Schoonover; +Cc: linux-kernel
On Sat, May 06, 2006 at 06:02:47PM -0700, Jason Schoonover wrote:
> The interesting thing was that after I did a Ctrl-C on the ncftpget, the id
> column was still at 0, even though the ncftpget process was over. The id
> column was at 0 and the 'wa' column was at 98, up until all of the pdflush
> processes ended.
>
> Is that the expected behavior?
Yes - data is still being written out. 'wa' stands for waiting for io. As
long as 'us' and 'sy' are not 100 (together), your system ('computing
power') is not 'busy'.
The lines below are perfect:
> 0 2 40 47816 7888 1920116 0 0 0 36264 1354 56 0 1 0 99
> 0 2 40 48312 7888 1920116 0 0 0 36248 1362 52 0 1 0 99
Wether you should have 5 pdflushes running is something I have no relevant
experience about, but your system should function just fine during writeout.
--
http://www.PowerDNS.com Open source, database driven DNS Software
http://netherlabs.nl Open and Closed source services
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-05 17:10 Jason Schoonover
2006-05-06 23:03 ` bert hubert
@ 2006-05-07 16:50 ` Andrew Morton
2006-05-07 17:24 ` Jason Schoonover
` (2 more replies)
1 sibling, 3 replies; 40+ messages in thread
From: Andrew Morton @ 2006-05-07 16:50 UTC (permalink / raw)
To: Jason Schoonover; +Cc: linux-kernel
On Fri, 5 May 2006 10:10:19 -0700
Jason Schoonover <jasons@pioneer-pra.com> wrote:
> I'm having some problems on the latest 2.6.17-rc3 kernel and SCSI disk I/O.
> Whenever I copy any large file (over 500GB) the load average starts to slowly
> rise and after about a minute it is up to 7.5 and keeps on rising (depending
> on how long the file takes to copy). When I watch top, the processes at the
> top of the list are cp, pdflush, kjournald and kswapd.
This is probably because the number of pdflush threads slowly grows to its
maximum. This is bogus, and we seem to have broken it sometime in the past
few releases. I need to find a few quality hours to get in there and fix
it, but they're rare :(
It's pretty harmless though. The "load average" thing just means that the
extra pdflush threads are twiddling thumbs waiting on some disk I/O -
they'll later exit and clean themselves up. They won't be consuming
significant resources.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-07 16:50 ` Andrew Morton
@ 2006-05-07 17:24 ` Jason Schoonover
2006-05-08 11:13 ` Erik Mouw
2006-05-08 14:24 ` Martin J. Bligh
2 siblings, 0 replies; 40+ messages in thread
From: Jason Schoonover @ 2006-05-07 17:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Hi Andrew,
I see, so it sounds as though the load average is not telling the true load of
the machine. However, it still feels that it's consuming quite a bit of
resources. The machine is almost unresponsive if I were to start two or
three copies at the same time.
I noticed the behavior initially when I installed vmware server: I started one
of the vm's booting and was copying another in the background (about a 12GB
directory). The booting VM would started getting slower and slower and
eventually just hung. It wasn't locked up, just seemed like it was "paused."
When I tried an "ls" in another window, it just hung. I then tried to ssh
into the server to open another window and I couldn't even get an ssh prompt.
I had to eventually Ctrl-C the copy and wait for it to be done before I could
do anything. And the load average had skyrocketed, but the consensus here is
definitely that it's not the true load average of the system.
Possibly should I revert back to an older kernel? 2.6.12 or 2.6.10 maybe? Do
you know when abouts the I/O was changed?
I can certainly help debug this issue if you (or someone else) has the time to
look into it and fix it. Otherwise I will just revert back and hope that it
will get fixed in the future.
Thanks,
Jason
-------Original Message-----
From: Andrew Morton
Sent: Sunday 07 May 2006 09:50
To: Jason Schoonover
Subject: Re: High load average on disk I/O on 2.6.17-rc3
On Fri, 5 May 2006 10:10:19 -0700
Jason Schoonover <jasons@pioneer-pra.com> wrote:
> I'm having some problems on the latest 2.6.17-rc3 kernel and SCSI disk I/O.
> Whenever I copy any large file (over 500GB) the load average starts to
> slowly rise and after about a minute it is up to 7.5 and keeps on rising
> (depending on how long the file takes to copy). When I watch top, the
> processes at the top of the list are cp, pdflush, kjournald and kswapd.
This is probably because the number of pdflush threads slowly grows to its
maximum. This is bogus, and we seem to have broken it sometime in the past
few releases. I need to find a few quality hours to get in there and fix
it, but they're rare :(
It's pretty harmless though. The "load average" thing just means that the
extra pdflush threads are twiddling thumbs waiting on some disk I/O -
they'll later exit and clean themselves up. They won't be consuming
significant resources.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-07 16:50 ` Andrew Morton
2006-05-07 17:24 ` Jason Schoonover
@ 2006-05-08 11:13 ` Erik Mouw
2006-05-08 11:22 ` Arjan van de Ven
2006-05-08 14:24 ` Martin J. Bligh
2 siblings, 1 reply; 40+ messages in thread
From: Erik Mouw @ 2006-05-08 11:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jason Schoonover, linux-kernel
On Sun, May 07, 2006 at 09:50:39AM -0700, Andrew Morton wrote:
> This is probably because the number of pdflush threads slowly grows to its
> maximum. This is bogus, and we seem to have broken it sometime in the past
> few releases. I need to find a few quality hours to get in there and fix
> it, but they're rare :(
>
> It's pretty harmless though. The "load average" thing just means that the
> extra pdflush threads are twiddling thumbs waiting on some disk I/O -
> they'll later exit and clean themselves up. They won't be consuming
> significant resources.
Not completely harmless. Some daemons (sendmail, exim) use the load
average to decide if they will allow more work. A local user could
create a mail DoS by just copying a couple of large files around.
Zeniv.linux.org.uk mail went down due to this. See
http://lkml.org/lkml/2006/3/28/70 .
Erik
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 11:13 ` Erik Mouw
@ 2006-05-08 11:22 ` Arjan van de Ven
2006-05-08 11:28 ` Russell King
0 siblings, 1 reply; 40+ messages in thread
From: Arjan van de Ven @ 2006-05-08 11:22 UTC (permalink / raw)
To: Erik Mouw; +Cc: Andrew Morton, Jason Schoonover, linux-kernel
On Mon, 2006-05-08 at 13:13 +0200, Erik Mouw wrote:
> On Sun, May 07, 2006 at 09:50:39AM -0700, Andrew Morton wrote:
> > This is probably because the number of pdflush threads slowly grows to its
> > maximum. This is bogus, and we seem to have broken it sometime in the past
> > few releases. I need to find a few quality hours to get in there and fix
> > it, but they're rare :(
> >
> > It's pretty harmless though. The "load average" thing just means that the
> > extra pdflush threads are twiddling thumbs waiting on some disk I/O -
> > they'll later exit and clean themselves up. They won't be consuming
> > significant resources.
>
> Not completely harmless. Some daemons (sendmail, exim) use the load
> average to decide if they will allow more work.
and those need to be fixed most likely ;)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 11:22 ` Arjan van de Ven
@ 2006-05-08 11:28 ` Russell King
2006-05-08 11:38 ` Avi Kivity
` (2 more replies)
0 siblings, 3 replies; 40+ messages in thread
From: Russell King @ 2006-05-08 11:28 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Erik Mouw, Andrew Morton, Jason Schoonover, linux-kernel
On Mon, May 08, 2006 at 01:22:36PM +0200, Arjan van de Ven wrote:
> On Mon, 2006-05-08 at 13:13 +0200, Erik Mouw wrote:
> > On Sun, May 07, 2006 at 09:50:39AM -0700, Andrew Morton wrote:
> > > This is probably because the number of pdflush threads slowly grows to its
> > > maximum. This is bogus, and we seem to have broken it sometime in the past
> > > few releases. I need to find a few quality hours to get in there and fix
> > > it, but they're rare :(
> > >
> > > It's pretty harmless though. The "load average" thing just means that the
> > > extra pdflush threads are twiddling thumbs waiting on some disk I/O -
> > > they'll later exit and clean themselves up. They won't be consuming
> > > significant resources.
> >
> > Not completely harmless. Some daemons (sendmail, exim) use the load
> > average to decide if they will allow more work.
>
> and those need to be fixed most likely ;)
Why do you think that? exim uses the load average to work out whether
it's a good idea to spawn more copies of itself, and increase the load
on the machine.
Unfortunately though, under 2.6 kernels, the load average seems to be
a meaningless indication of how busy the system is from that point of
view.
Having a single CPU machine with a load average of 150 and still feel
very interactive at the shell is extremely counter-intuitive.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 11:28 ` Russell King
@ 2006-05-08 11:38 ` Avi Kivity
2006-05-08 12:37 ` Arjan van de Ven
2006-05-09 14:37 ` Bill Davidsen
2 siblings, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2006-05-08 11:38 UTC (permalink / raw)
To: rmk+lkml
Cc: Arjan van de Ven, Erik Mouw, Andrew Morton, Jason Schoonover,
linux-kernel
Russell King wrote:
> Why do you think that? exim uses the load average to work out whether
> it's a good idea to spawn more copies of itself, and increase the load
> on the machine.
>
> Unfortunately though, under 2.6 kernels, the load average seems to be
> a meaningless indication of how busy the system is from that point of
> view.
>
> Having a single CPU machine with a load average of 150 and still feel
> very interactive at the shell is extremely counter-intuitive.
>
It's even worse: load average used to mean the number of runnable
processes + number of processes waiting on disk or NFS I/O to complete,
a fairly bogus measure as you have noted, but with the aio interfaces
one can issue enormous amounts of I/O without it being counted in the
load average.
To make such decisions real, one needs separate counters for cpu load
and for disk load on the devices one is actually using.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 11:28 ` Russell King
2006-05-08 11:38 ` Avi Kivity
@ 2006-05-08 12:37 ` Arjan van de Ven
2006-05-09 14:37 ` Bill Davidsen
2 siblings, 0 replies; 40+ messages in thread
From: Arjan van de Ven @ 2006-05-08 12:37 UTC (permalink / raw)
To: Russell King; +Cc: linux-kernel, Jason Schoonover, Andrew Morton, Erik Mouw
On Mon, 2006-05-08 at 12:28 +0100, Russell King wrote:
> On Mon, May 08, 2006 at 01:22:36PM +0200, Arjan van de Ven wrote:
> > On Mon, 2006-05-08 at 13:13 +0200, Erik Mouw wrote:
> > > On Sun, May 07, 2006 at 09:50:39AM -0700, Andrew Morton wrote:
> > > > This is probably because the number of pdflush threads slowly grows to its
> > > > maximum. This is bogus, and we seem to have broken it sometime in the past
> > > > few releases. I need to find a few quality hours to get in there and fix
> > > > it, but they're rare :(
> > > >
> > > > It's pretty harmless though. The "load average" thing just means that the
> > > > extra pdflush threads are twiddling thumbs waiting on some disk I/O -
> > > > they'll later exit and clean themselves up. They won't be consuming
> > > > significant resources.
> > >
> > > Not completely harmless. Some daemons (sendmail, exim) use the load
> > > average to decide if they will allow more work.
> >
> > and those need to be fixed most likely ;)
>
> Why do you think that?
I think that because the characteristics of modern hardware don't make
"load" a good estimator for finding out if the hardware can take more
jobs.
To explain why I'm thinking this I first need to do an ascii art graph
100
% | ************************|
b | ****
u | ***
s | **
y | *
| *
| *
|*
+------------------------------------------
-> workload
on the Y axis is the percentage in use, on the horizontal axis the
amount of work that is done. (in the mail case, say emails per second).
Modern hardware has an initial ramp-up which is near linear in terms of
workload/use, but then a saturation area is reached at 100% where, even
though a system is 100% busy, more work can be added, upto a certain
point that I showed with a "|". This is due to the behavior of increased
batching that you get at higher utilization compared to the behavior at
lower utilizations. Both cpu caches, memory burst speeds-vs-latency, but
also disk streaming performance vs random seeks... all those will create
and increase this saturation space to the right. And all of those have
been increasing in the hardware the last 4+ years, with the result that
the saturation "reach" has increased to the right as well, by far.
How does this tie into "load" and using load for what exim/sendmail use
it for? Well.... Today "load" is a somewhat poor approximation of this
percentage-in-use[1], but... as per the graph and argument above, even
if it was a perfect representation of that, it still would not be a good
measure to determine if a system can do more work (per time unit) or
not.
[1] I didn't discuss the use of *what*; in reality that is a combination
of cpu, memory, disk and possibly network resources. Load tries to
combine cpu and disk into one number via a sheer addition; that's an
obviously rough estimate and I'm not arguing that it's not rough.
> Having a single CPU machine with a load average of 150 and still feel
> very interactive at the shell is extremely counter-intuitive.
Well it's also a sign that the cpu scheduler is prioritizing your shell
over the background "menial" work ;)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-07 16:50 ` Andrew Morton
2006-05-07 17:24 ` Jason Schoonover
2006-05-08 11:13 ` Erik Mouw
@ 2006-05-08 14:24 ` Martin J. Bligh
2006-05-08 14:55 ` Arjan van de Ven
2 siblings, 1 reply; 40+ messages in thread
From: Martin J. Bligh @ 2006-05-08 14:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jason Schoonover, linux-kernel
> It's pretty harmless though. The "load average" thing just means that the
> extra pdflush threads are twiddling thumbs waiting on some disk I/O -
> they'll later exit and clean themselves up. They won't be consuming
> significant resources.
If they're waiting on disk I/O, they shouldn't be runnable, and thus
should not be counted as part of the load average, surely?
M.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 14:24 ` Martin J. Bligh
@ 2006-05-08 14:55 ` Arjan van de Ven
2006-05-08 15:22 ` Erik Mouw
0 siblings, 1 reply; 40+ messages in thread
From: Arjan van de Ven @ 2006-05-08 14:55 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Jason Schoonover, linux-kernel
On Mon, 2006-05-08 at 07:24 -0700, Martin J. Bligh wrote:
> > It's pretty harmless though. The "load average" thing just means that the
> > extra pdflush threads are twiddling thumbs waiting on some disk I/O -
> > they'll later exit and clean themselves up. They won't be consuming
> > significant resources.
>
> If they're waiting on disk I/O, they shouldn't be runnable, and thus
> should not be counted as part of the load average, surely?
yes they are, since at least a decade. "load average" != "cpu
utilisation" by any means. It's "tasks waiting for a hardware resource
to become available". CPU is one such resource (runnable) but disk is
another. There are more ...
think of load as "if I bought faster hardware this would improve"
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 14:55 ` Arjan van de Ven
@ 2006-05-08 15:22 ` Erik Mouw
2006-05-08 15:25 ` Martin J. Bligh
` (2 more replies)
0 siblings, 3 replies; 40+ messages in thread
From: Erik Mouw @ 2006-05-08 15:22 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Martin J. Bligh, Andrew Morton, Jason Schoonover, linux-kernel
On Mon, May 08, 2006 at 04:55:48PM +0200, Arjan van de Ven wrote:
> On Mon, 2006-05-08 at 07:24 -0700, Martin J. Bligh wrote:
> > > It's pretty harmless though. The "load average" thing just means that the
> > > extra pdflush threads are twiddling thumbs waiting on some disk I/O -
> > > they'll later exit and clean themselves up. They won't be consuming
> > > significant resources.
> >
> > If they're waiting on disk I/O, they shouldn't be runnable, and thus
> > should not be counted as part of the load average, surely?
>
> yes they are, since at least a decade. "load average" != "cpu
> utilisation" by any means. It's "tasks waiting for a hardware resource
> to become available". CPU is one such resource (runnable) but disk is
> another. There are more ...
... except that any kernel < 2.6 didn't account tasks waiting for disk
IO. Load average has always been somewhat related to tasks contending
for CPU power. It's easy to say "shrug, it changed, live with it", but
at least give applications that want to be nice to the system a way to
figure out the real cpu load.
Erik
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:22 ` Erik Mouw
@ 2006-05-08 15:25 ` Martin J. Bligh
2006-05-08 15:31 ` Arjan van de Ven
2006-05-08 22:24 ` Bernd Eckenfels
2 siblings, 0 replies; 40+ messages in thread
From: Martin J. Bligh @ 2006-05-08 15:25 UTC (permalink / raw)
To: Erik Mouw; +Cc: Arjan van de Ven, Andrew Morton, Jason Schoonover, linux-kernel
Erik Mouw wrote:
> On Mon, May 08, 2006 at 04:55:48PM +0200, Arjan van de Ven wrote:
>
>>On Mon, 2006-05-08 at 07:24 -0700, Martin J. Bligh wrote:
>>
>>>>It's pretty harmless though. The "load average" thing just means that the
>>>>extra pdflush threads are twiddling thumbs waiting on some disk I/O -
>>>>they'll later exit and clean themselves up. They won't be consuming
>>>>significant resources.
>>>
>>>If they're waiting on disk I/O, they shouldn't be runnable, and thus
>>>should not be counted as part of the load average, surely?
>>
>>yes they are, since at least a decade. "load average" != "cpu
>>utilisation" by any means. It's "tasks waiting for a hardware resource
>>to become available". CPU is one such resource (runnable) but disk is
>>another. There are more ...
>
>
> ... except that any kernel < 2.6 didn't account tasks waiting for disk
> IO. Load average has always been somewhat related to tasks contending
> for CPU power. It's easy to say "shrug, it changed, live with it", but
> at least give applications that want to be nice to the system a way to
> figure out the real cpu load.
I had a patch to create a real, per-cpu load average. I guess I'll dig
it out again, since it was also extremely useful for diagnosing
scheduler issues.
Maybe I'm confused about what the loadavg figure in Linux was in 2.6,
I'll go read the code again. Not sure it's very useful to provide only
a combined figure of all waiting tasks without separated versions as
well, really.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:22 ` Erik Mouw
2006-05-08 15:25 ` Martin J. Bligh
@ 2006-05-08 15:31 ` Arjan van de Ven
2006-05-08 15:42 ` Erik Mouw
2006-05-09 1:57 ` Nick Piggin
2006-05-08 22:24 ` Bernd Eckenfels
2 siblings, 2 replies; 40+ messages in thread
From: Arjan van de Ven @ 2006-05-08 15:31 UTC (permalink / raw)
To: Erik Mouw; +Cc: Martin J. Bligh, Andrew Morton, Jason Schoonover, linux-kernel
On Mon, 2006-05-08 at 17:22 +0200, Erik Mouw wrote:
> On Mon, May 08, 2006 at 04:55:48PM +0200, Arjan van de Ven wrote:
> > On Mon, 2006-05-08 at 07:24 -0700, Martin J. Bligh wrote:
> > > > It's pretty harmless though. The "load average" thing just means that the
> > > > extra pdflush threads are twiddling thumbs waiting on some disk I/O -
> > > > they'll later exit and clean themselves up. They won't be consuming
> > > > significant resources.
> > >
> > > If they're waiting on disk I/O, they shouldn't be runnable, and thus
> > > should not be counted as part of the load average, surely?
> >
> > yes they are, since at least a decade. "load average" != "cpu
> > utilisation" by any means. It's "tasks waiting for a hardware resource
> > to become available". CPU is one such resource (runnable) but disk is
> > another. There are more ...
>
> ... except that any kernel < 2.6 didn't account tasks waiting for disk
> IO.
they did. It was "D" state, which counted into load average.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:31 ` Arjan van de Ven
@ 2006-05-08 15:42 ` Erik Mouw
2006-05-08 16:02 ` Martin J. Bligh
` (3 more replies)
2006-05-09 1:57 ` Nick Piggin
1 sibling, 4 replies; 40+ messages in thread
From: Erik Mouw @ 2006-05-08 15:42 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Martin J. Bligh, Andrew Morton, Jason Schoonover, linux-kernel
On Mon, May 08, 2006 at 05:31:29PM +0200, Arjan van de Ven wrote:
> On Mon, 2006-05-08 at 17:22 +0200, Erik Mouw wrote:
> > ... except that any kernel < 2.6 didn't account tasks waiting for disk
> > IO.
>
> they did. It was "D" state, which counted into load average.
They did not or at least to a much lesser extent. That's the reason why
ZenIV.linux.org.uk had a mail DoS during the last FC release and why we
see load average questions on lkml.
I've seen it on our servers as well: when using 2.4 and doing 50 MB/s
to disk (through NFS), the load just was slightly above 0. When we
switched the servers to 2.6 it went to ~16 for the same disk usage.
Erik
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:42 ` Erik Mouw
@ 2006-05-08 16:02 ` Martin J. Bligh
2006-05-08 16:02 ` Miquel van Smoorenburg
` (2 subsequent siblings)
3 siblings, 0 replies; 40+ messages in thread
From: Martin J. Bligh @ 2006-05-08 16:02 UTC (permalink / raw)
To: Erik Mouw; +Cc: Arjan van de Ven, Andrew Morton, Jason Schoonover, linux-kernel
Erik Mouw wrote:
> On Mon, May 08, 2006 at 05:31:29PM +0200, Arjan van de Ven wrote:
>
>>On Mon, 2006-05-08 at 17:22 +0200, Erik Mouw wrote:
>>
>>>... except that any kernel < 2.6 didn't account tasks waiting for disk
>>>IO.
>>
>>they did. It was "D" state, which counted into load average.
>
>
> They did not or at least to a much lesser extent. That's the reason why
> ZenIV.linux.org.uk had a mail DoS during the last FC release and why we
> see load average questions on lkml.
>
> I've seen it on our servers as well: when using 2.4 and doing 50 MB/s
> to disk (through NFS), the load just was slightly above 0. When we
> switched the servers to 2.6 it went to ~16 for the same disk usage.
Looks like both count it, or something stranger is going on.
2.6.16:
static unsigned long count_active_tasks(void)
{
return (nr_running() + nr_uninterruptible()) * FIXED_1;
}
2.4.0:
static unsigned long count_active_tasks(void)
{
struct task_struct *p;
unsigned long nr = 0;
read_lock(&tasklist_lock);
for_each_task(p) {
if ((p->state == TASK_RUNNING ||
(p->state & TASK_UNINTERRUPTIBLE)))
nr += FIXED_1;
}
read_unlock(&tasklist_lock);
return nr;
}
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:42 ` Erik Mouw
2006-05-08 16:02 ` Martin J. Bligh
@ 2006-05-08 16:02 ` Miquel van Smoorenburg
2006-05-08 16:47 ` Russell King
2006-05-08 17:18 ` Mike Galbraith
3 siblings, 0 replies; 40+ messages in thread
From: Miquel van Smoorenburg @ 2006-05-08 16:02 UTC (permalink / raw)
To: linux-kernel
In article <20060508154217.GH1875@harddisk-recovery.com>,
Erik Mouw <erik@harddisk-recovery.com> wrote:
>On Mon, May 08, 2006 at 05:31:29PM +0200, Arjan van de Ven wrote:
>> On Mon, 2006-05-08 at 17:22 +0200, Erik Mouw wrote:
>> > ... except that any kernel < 2.6 didn't account tasks waiting for disk
>> > IO.
>>
>> they did. It was "D" state, which counted into load average.
>
>They did not or at least to a much lesser extent.
I just looked at the 2.4.9 (random 2.4 kernel) source code, and
kernel/timer.c::count_active_tasks(), which is what calculates the
load average, uses the same algorithm as in 2.6.16
Mike.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:42 ` Erik Mouw
2006-05-08 16:02 ` Martin J. Bligh
2006-05-08 16:02 ` Miquel van Smoorenburg
@ 2006-05-08 16:47 ` Russell King
2006-05-08 17:04 ` Gabor Gombas
2006-05-08 17:18 ` Mike Galbraith
3 siblings, 1 reply; 40+ messages in thread
From: Russell King @ 2006-05-08 16:47 UTC (permalink / raw)
To: Erik Mouw
Cc: Arjan van de Ven, Martin J. Bligh, Andrew Morton,
Jason Schoonover, linux-kernel
On Mon, May 08, 2006 at 05:42:18PM +0200, Erik Mouw wrote:
> On Mon, May 08, 2006 at 05:31:29PM +0200, Arjan van de Ven wrote:
> > On Mon, 2006-05-08 at 17:22 +0200, Erik Mouw wrote:
> > > ... except that any kernel < 2.6 didn't account tasks waiting for disk
> > > IO.
> >
> > they did. It was "D" state, which counted into load average.
>
> They did not or at least to a much lesser extent. That's the reason why
> ZenIV.linux.org.uk had a mail DoS during the last FC release and why we
> see load average questions on lkml.
>
> I've seen it on our servers as well: when using 2.4 and doing 50 MB/s
> to disk (through NFS), the load just was slightly above 0. When we
> switched the servers to 2.6 it went to ~16 for the same disk usage.
It's actually rather interesting to look at 2.6 and load averages.
The load average appears to depend on the type of load, rather than
the real load. Let's look at three different cases:
1. while (1) { } loop in a C program.
Starting off with a load average of 0.00 with a "watch uptime" running
(and leaving that running for several minutes), then starting such a
program, and letting it run for one minute.
Result: load average at the end: 0.60
2. a program which runs continuously for 6 seconds and then sleeps for
54 seconds.
Result: load average peaks at 0.12, drops to 0.05 just before it
runs for a second 6 second.
ps initially reports 100% CPU usage, drops to 10% after one minute,
rises to 18% and drops back towards 10%, and gradually settles on
10%.
3. a program which runs for 1 second and then sleeps for 9 seconds.
Result: load average peaks at 0.22, drops to 0.15 just before it
runs for the next second.
ps reports 10% CPU usage.
Okay, so, two different CPU work loads without any other IO, using the
same total amount of CPU time every minute seem to produce two very
different load averages. In addition, using 100% CPU for one minute
does not produce a load average of 1.
Seems to me that somethings wrong somewhere. Either that or the first
load average number no longer represents the load over the past one
minute.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 16:47 ` Russell King
@ 2006-05-08 17:04 ` Gabor Gombas
0 siblings, 0 replies; 40+ messages in thread
From: Gabor Gombas @ 2006-05-08 17:04 UTC (permalink / raw)
To: Erik Mouw, Arjan van de Ven, Martin J. Bligh, Andrew Morton,
Jason Schoonover, linux-kernel
On Mon, May 08, 2006 at 05:47:05PM +0100, Russell King wrote:
> Seems to me that somethings wrong somewhere. Either that or the first
> load average number no longer represents the load over the past one
> minute.
... or just the load average statistics and the CPU usage statistics are
computed using different algorithms and they thus estimate different
things. At least the sampling frequency is surely different.
Gabor
--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:42 ` Erik Mouw
` (2 preceding siblings ...)
2006-05-08 16:47 ` Russell King
@ 2006-05-08 17:18 ` Mike Galbraith
3 siblings, 0 replies; 40+ messages in thread
From: Mike Galbraith @ 2006-05-08 17:18 UTC (permalink / raw)
To: Erik Mouw
Cc: Arjan van de Ven, Martin J. Bligh, Andrew Morton,
Jason Schoonover, linux-kernel
On Mon, 2006-05-08 at 17:42 +0200, Erik Mouw wrote:
> On Mon, May 08, 2006 at 05:31:29PM +0200, Arjan van de Ven wrote:
> > On Mon, 2006-05-08 at 17:22 +0200, Erik Mouw wrote:
> > > ... except that any kernel < 2.6 didn't account tasks waiting for disk
> > > IO.
> >
> > they did. It was "D" state, which counted into load average.
>
> They did not or at least to a much lesser extent. That's the reason why
> ZenIV.linux.org.uk had a mail DoS during the last FC release and why we
> see load average questions on lkml.
I distinctly recall it counting, but since I don't have a 2.4 tree
handy, I'll refrain from saying "did _too_" ;-)
> I've seen it on our servers as well: when using 2.4 and doing 50 MB/s
> to disk (through NFS), the load just was slightly above 0. When we
> switched the servers to 2.6 it went to ~16 for the same disk usage.
The main difference I see is...
8129 root 15 0 3500 512 432 D 56.0 0.0 0:33.72 bonnie
1393 root 10 -5 0 0 0 D 0.4 0.0 0:00.26 kjournald
8135 root 15 0 0 0 0 D 0.0 0.0 0:00.01 pdflush
573 root 15 0 0 0 0 D 0.0 0.0 0:00.00 pdflush
574 root 15 0 0 0 0 D 0.0 0.0 0:00.04 pdflush
8131 root 15 0 0 0 0 D 0.0 0.0 0:00.01 pdflush
8141 root 15 0 0 0 0 D 0.0 0.0 0:00.00 pdflush
With 2.4, there was only one flush thread. Same load, different
loadavg... in this particular case of one user task running. IIRC, if
you had a bunch of things running and running you low on memory, you
could end up with a slew of 'D' state tasks in 2.4 as well, because
allocating tasks had to help free memory by flushing buffers and pushing
swap. Six to one, half a dozen to the other.
-Mike
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:22 ` Erik Mouw
2006-05-08 15:25 ` Martin J. Bligh
2006-05-08 15:31 ` Arjan van de Ven
@ 2006-05-08 22:24 ` Bernd Eckenfels
2006-05-08 22:39 ` Lee Revell
` (2 more replies)
2 siblings, 3 replies; 40+ messages in thread
From: Bernd Eckenfels @ 2006-05-08 22:24 UTC (permalink / raw)
To: linux-kernel
Erik Mouw <erik@harddisk-recovery.com> wrote:
> ... except that any kernel < 2.6 didn't account tasks waiting for disk
> IO. Load average has always been somewhat related to tasks contending
> for CPU power.
Actually all Linux kernels accounted for diskwaits and others like BSD based
not. It is a very old linux oddness.
Gruss
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 22:24 ` Bernd Eckenfels
@ 2006-05-08 22:39 ` Lee Revell
2006-05-09 0:08 ` Peter Williams
2006-05-09 18:33 ` Bill Davidsen
2 siblings, 0 replies; 40+ messages in thread
From: Lee Revell @ 2006-05-08 22:39 UTC (permalink / raw)
To: Bernd Eckenfels; +Cc: linux-kernel
On Tue, 2006-05-09 at 00:24 +0200, Bernd Eckenfels wrote:
> Erik Mouw <erik@harddisk-recovery.com> wrote:
> > ... except that any kernel < 2.6 didn't account tasks waiting for disk
> > IO. Load average has always been somewhat related to tasks contending
> > for CPU power.
>
> Actually all Linux kernels accounted for diskwaits and others like BSD based
> not. It is a very old linux oddness.
Maybe I am misunderstanding, but IIRC BSD/OS also counted processes
waiting on IO towards the load average.
Lee
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 22:24 ` Bernd Eckenfels
2006-05-08 22:39 ` Lee Revell
@ 2006-05-09 0:08 ` Peter Williams
2006-05-09 18:33 ` Bill Davidsen
2 siblings, 0 replies; 40+ messages in thread
From: Peter Williams @ 2006-05-09 0:08 UTC (permalink / raw)
To: linux-kernel
Bernd Eckenfels wrote:
> Erik Mouw <erik@harddisk-recovery.com> wrote:
>> ... except that any kernel < 2.6 didn't account tasks waiting for disk
>> IO. Load average has always been somewhat related to tasks contending
>> for CPU power.
>
> Actually all Linux kernels accounted for diskwaits and others like BSD based
> not. It is a very old linux oddness.
Personally, I see both types of load estimates (i.e. CPU only and CPU
plus IO wait) as useful. Why can't we have both? The cost would be
minimal.
Peter
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 15:31 ` Arjan van de Ven
2006-05-08 15:42 ` Erik Mouw
@ 2006-05-09 1:57 ` Nick Piggin
2006-05-09 2:02 ` Martin Bligh
2006-05-09 4:36 ` Arjan van de Ven
1 sibling, 2 replies; 40+ messages in thread
From: Nick Piggin @ 2006-05-09 1:57 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Erik Mouw, Martin J. Bligh, Andrew Morton, Jason Schoonover,
linux-kernel
Arjan van de Ven wrote:
>>... except that any kernel < 2.6 didn't account tasks waiting for disk
>>IO.
>>
>
>they did. It was "D" state, which counted into load average.
>
Perhaps kernel threads in D state should not contribute toward load avg.
Userspace does not care whether there are 2 or 20 pdflush threads waiting
for IO. However, when the network/disks can no longer keep up, userspace
processes will end up going to sleep in writeback or reclaim -- *that* is
when we start feeling the load.
--
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-09 1:57 ` Nick Piggin
@ 2006-05-09 2:02 ` Martin Bligh
2006-05-09 2:16 ` Nick Piggin
2006-05-09 4:36 ` Arjan van de Ven
1 sibling, 1 reply; 40+ messages in thread
From: Martin Bligh @ 2006-05-09 2:02 UTC (permalink / raw)
To: Nick Piggin
Cc: Arjan van de Ven, Erik Mouw, Andrew Morton, Jason Schoonover,
linux-kernel
Nick Piggin wrote:
> Arjan van de Ven wrote:
>
>>> ... except that any kernel < 2.6 didn't account tasks waiting for disk
>>> IO.
>>>
>>
>> they did. It was "D" state, which counted into load average.
>>
>
> Perhaps kernel threads in D state should not contribute toward load avg.
>
> Userspace does not care whether there are 2 or 20 pdflush threads waiting
> for IO. However, when the network/disks can no longer keep up, userspace
> processes will end up going to sleep in writeback or reclaim -- *that* is
> when we start feeling the load.
Personally I'd be far happier having separated counters for both. Then
we can see what the real bottleneck is. Whilst we're at it, on a per-cpu
and per-elevator-queue basis ;-)
M.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-09 2:02 ` Martin Bligh
@ 2006-05-09 2:16 ` Nick Piggin
0 siblings, 0 replies; 40+ messages in thread
From: Nick Piggin @ 2006-05-09 2:16 UTC (permalink / raw)
To: Martin Bligh
Cc: Arjan van de Ven, Erik Mouw, Andrew Morton, Jason Schoonover,
linux-kernel
Martin Bligh wrote:
> Nick Piggin wrote:
>
>>
>> Perhaps kernel threads in D state should not contribute toward load avg.
>>
>> Userspace does not care whether there are 2 or 20 pdflush threads
>> waiting
>> for IO. However, when the network/disks can no longer keep up, userspace
>> processes will end up going to sleep in writeback or reclaim --
>> *that* is
>> when we start feeling the load.
>
>
> Personally I'd be far happier having separated counters for both.
Well so long as userspace never blocks, blocked kernel threads aren't a
bottleneck (OK, perhaps things like nfsd are an exception, but kernel
threads doing asynch work on behalf of userspace, like pdflush or kswapd).
It is something simple we can do today that might decouple the kernel
implementation (eg. of pdflush) from the load average reporting.
> Then
> we can see what the real bottleneck is. Whilst we're at it, on a per-cpu
> and per-elevator-queue basis ;-)
Might be helpful, yes. At least separate counters for CPU and IO... but
that doesn't mean the global loadavg is going away.
--
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-09 1:57 ` Nick Piggin
2006-05-09 2:02 ` Martin Bligh
@ 2006-05-09 4:36 ` Arjan van de Ven
2006-05-09 4:46 ` Nick Piggin
2006-05-09 5:03 ` David Lang
1 sibling, 2 replies; 40+ messages in thread
From: Arjan van de Ven @ 2006-05-09 4:36 UTC (permalink / raw)
To: Nick Piggin
Cc: Erik Mouw, Martin J. Bligh, Andrew Morton, Jason Schoonover,
linux-kernel
On Tue, 2006-05-09 at 11:57 +1000, Nick Piggin wrote:
> Arjan van de Ven wrote:
>
> >>... except that any kernel < 2.6 didn't account tasks waiting for disk
> >>IO.
> >>
> >
> >they did. It was "D" state, which counted into load average.
> >
>
> Perhaps kernel threads in D state should not contribute toward load avg
that would be a change from, well... a LONG time
The question is what "load" means; if you want to change that... then
there are even better metrics possible. Like
"number of processes wanting to run + number of busy spindles + number
of busy nics + number of VM zones that are below the problem
watermark" (where "busy" means "queue full")
or 50 million other definitions. If we're going to change the meaning,
we might as well give it a "real" meaning.
(And even then it is NOT a good measure for determining if the machine
can perform more work, the graph I put in a previous mail is very real,
and in practice it seems the saturation line is easily 4x or 5x of the
"linear" point)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-09 4:36 ` Arjan van de Ven
@ 2006-05-09 4:46 ` Nick Piggin
2006-05-09 5:27 ` Hua Zhong
2006-05-09 5:03 ` David Lang
1 sibling, 1 reply; 40+ messages in thread
From: Nick Piggin @ 2006-05-09 4:46 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Erik Mouw, Martin J. Bligh, Andrew Morton, Jason Schoonover,
linux-kernel
Arjan van de Ven wrote:
>On Tue, 2006-05-09 at 11:57 +1000, Nick Piggin wrote:
>
>>Perhaps kernel threads in D state should not contribute toward load avg
>>
>
>that would be a change from, well... a LONG time
>
But presently it changes all the time when we change the implementation
of pdflush or kswapd.
If we make pdflush threads blk_congestion_wait for twice as long, and
end up creating twice as many to feed the same amount of IO, our load
magically doubles but the machine is under almost exactly the same
load condition.
Back when we didn't have all these kernel threads doing work for us,
that wasn't an issue.
>
>The question is what "load" means; if you want to change that... then
>there are even better metrics possible. Like
>"number of processes wanting to run + number of busy spindles + number
>of busy nics + number of VM zones that are below the problem
>watermark" (where "busy" means "queue full")
>
>or 50 million other definitions. If we're going to change the meaning,
>we might as well give it a "real" meaning.
>
I'm not sure if that is any better, and perhaps even worse. It does not
matter that much if VM zones are under a watermark if kswapd is taking
care of the problem and nothing ever blocks on memory IO.
(Sure kswapd will contribute to CPU usage, but that *will* be reflected
in load average)
>
>(And even then it is NOT a good measure for determining if the machine
>can perform more work, the graph I put in a previous mail is very real,
>and in practice it seems the saturation line is easily 4x or 5x of the
>"linear" point)
>
A global loadavg isn't too good anyway, as everyone has observed, there
are many independant resources. But my point is that it isn't going away
while apps still use it, so my point is that this might be an easy way to
improve it.
--
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-09 4:36 ` Arjan van de Ven
2006-05-09 4:46 ` Nick Piggin
@ 2006-05-09 5:03 ` David Lang
2006-05-15 7:46 ` Sander
1 sibling, 1 reply; 40+ messages in thread
From: David Lang @ 2006-05-09 5:03 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Nick Piggin, Erik Mouw, Martin J. Bligh, Andrew Morton,
Jason Schoonover, linux-kernel
On Tue, 9 May 2006, Arjan van de Ven wrote:
> On Tue, 2006-05-09 at 11:57 +1000, Nick Piggin wrote:
>> Arjan van de Ven wrote:
>>
>>>> ... except that any kernel < 2.6 didn't account tasks waiting for disk
>>>> IO.
>>>>
>>>
>>> they did. It was "D" state, which counted into load average.
>>>
>>
>> Perhaps kernel threads in D state should not contribute toward load avg
>
> that would be a change from, well... a LONG time
>
> The question is what "load" means; if you want to change that... then
> there are even better metrics possible. Like
> "number of processes wanting to run + number of busy spindles + number
> of busy nics + number of VM zones that are below the problem
> watermark" (where "busy" means "queue full")
>
> or 50 million other definitions. If we're going to change the meaning,
> we might as well give it a "real" meaning.
>
> (And even then it is NOT a good measure for determining if the machine
> can perform more work, the graph I put in a previous mail is very real,
> and in practice it seems the saturation line is easily 4x or 5x of the
> "linear" point)
while this is true, it's also true that up in this area it's very easy for
a spike of activity to cascade through the box and bring everything down
to it's knees (I've seen a production box go from 'acceptable' response
time to being effectivly down for two hours with a small 'tar' command
(10's of K of writes) being the trigger that pushed it over the edge.
in general loadave > 2x #procs has been a good indication that the box is
in danger and needs careful watching. I don't know when Linux changed it's
loadavg calculation, but within the last several years there was a change
that caused the loadaveg to report higher for the same amount of activity
on the box. as a user it's hard to argue which is the more 'correct'
value.
of the various functions that you mentioned above.
# processes wanting to run.
gives a good indication if the cpu is the bottleneck. this is what
people think loadavg means (the textbooks may be wrong, but they're what
people learn from)
# spindles busy
gives a good indication if the disks are the bottleneck. this needs to
cover seek time and read/write time. My initial reaction is to base this
on the avg # of outstanding requests to the drive, but I'm not sure how
this would interact with TCQ/NCQ (it may just be that people need to know
their drives, and know that a higher value for those drives is
acceptable). This is one that I don't know how to find today (wait time
won't show if something else keeps the cpu busy). In many ways this stat
should be per-drive as well as any summary value (you can't just start
useing another spindle the way you can just use another cpu, even in a
NUMA system :-)
# Nic's busy
don't bother with this, the networking folks have been tracking this for
years, either locally on the box, or through the networking infrastructure
(mrtg and friends were built for this)
# vm zones below the danger point
I'm not sure about this one either in practice watching for pageing
rates to climb seems to work, but this area is where black magic
monitoring is in full force (and at the rate of change on the VM doesn't
help this understanding)
I can understand your reluctance to quickly tinker with the loadavg
calculation, but would it be possible to make the other values available
by themselves for a while. then people can experiment in userspace to find
the best way to combine the values into a single, nicely graphable 'health
of the box' value.
David Lang
P.S. I would love to be told that I'm just ignorant of how to monitor
these things independantly. it would make my life much easier to learn
how.
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: High load average on disk I/O on 2.6.17-rc3
2006-05-09 4:46 ` Nick Piggin
@ 2006-05-09 5:27 ` Hua Zhong
0 siblings, 0 replies; 40+ messages in thread
From: Hua Zhong @ 2006-05-09 5:27 UTC (permalink / raw)
To: 'Nick Piggin', 'Arjan van de Ven'
Cc: 'Erik Mouw', 'Martin J. Bligh',
'Andrew Morton', 'Jason Schoonover', linux-kernel
> A global loadavg isn't too good anyway, as everyone has
> observed, there are many independant resources. But my point
> is that it isn't going away while apps still use it, so my
> point is that this might be an easy way to improve it.
It's not just those MTA's using it. Worse, many watchdog implementations use it too, and they will reload the box if the load is too
high.
So we do need some ways to make the loadavg more meaningful or at least more predictable.
Hua
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 11:28 ` Russell King
2006-05-08 11:38 ` Avi Kivity
2006-05-08 12:37 ` Arjan van de Ven
@ 2006-05-09 14:37 ` Bill Davidsen
2 siblings, 0 replies; 40+ messages in thread
From: Bill Davidsen @ 2006-05-09 14:37 UTC (permalink / raw)
To: Arjan van de Ven, Erik Mouw, Andrew Morton, Jason Schoonover,
linux-kernel
Russell King wrote:
> On Mon, May 08, 2006 at 01:22:36PM +0200, Arjan van de Ven wrote:
>> On Mon, 2006-05-08 at 13:13 +0200, Erik Mouw wrote:
>>> On Sun, May 07, 2006 at 09:50:39AM -0700, Andrew Morton wrote:
>>>> This is probably because the number of pdflush threads slowly grows to its
>>>> maximum. This is bogus, and we seem to have broken it sometime in the past
>>>> few releases. I need to find a few quality hours to get in there and fix
>>>> it, but they're rare :(
>>>>
>>>> It's pretty harmless though. The "load average" thing just means that the
>>>> extra pdflush threads are twiddling thumbs waiting on some disk I/O -
>>>> they'll later exit and clean themselves up. They won't be consuming
>>>> significant resources.
>>> Not completely harmless. Some daemons (sendmail, exim) use the load
>>> average to decide if they will allow more work.
>> and those need to be fixed most likely ;)
>
> Why do you think that? exim uses the load average to work out whether
> it's a good idea to spawn more copies of itself, and increase the load
> on the machine.
>
> Unfortunately though, under 2.6 kernels, the load average seems to be
> a meaningless indication of how busy the system is from that point of
> view.
>
> Having a single CPU machine with a load average of 150 and still feel
> very interactive at the shell is extremely counter-intuitive.
>
The things which is important is runable (as in want the CPU now)
processes. I've seen the L.A. that high on other systems which were
running fine, AIX and OpenDesktop to name two. It's not just a Linux thing.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-08 22:24 ` Bernd Eckenfels
2006-05-08 22:39 ` Lee Revell
2006-05-09 0:08 ` Peter Williams
@ 2006-05-09 18:33 ` Bill Davidsen
2 siblings, 0 replies; 40+ messages in thread
From: Bill Davidsen @ 2006-05-09 18:33 UTC (permalink / raw)
To: Bernd Eckenfels, Linux Kernel Mailing List
Bernd Eckenfels wrote:
> Erik Mouw <erik@harddisk-recovery.com> wrote:
>> ... except that any kernel < 2.6 didn't account tasks waiting for disk
>> IO. Load average has always been somewhat related to tasks contending
>> for CPU power.
>
> Actually all Linux kernels accounted for diskwaits and others like BSD based
> not. It is a very old linux oddness.
Well, sort of. The current numbers are counting kernel threads against
load average, and before there were kernel threads that clearly didn't
happen. So what you say is true, but it's only a part of the truth.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: High load average on disk I/O on 2.6.17-rc3
2006-05-09 5:03 ` David Lang
@ 2006-05-15 7:46 ` Sander
0 siblings, 0 replies; 40+ messages in thread
From: Sander @ 2006-05-15 7:46 UTC (permalink / raw)
To: David Lang
Cc: Arjan van de Ven, Nick Piggin, Erik Mouw, Martin J. Bligh,
Andrew Morton, Jason Schoonover, linux-kernel
David Lang wrote (ao):
> P.S. I would love to be told that I'm just ignorant of how to monitor
> these things independantly. it would make my life much easier to learn
> how.
I use vmstat and top together to determine if a system is (too) busy or
not.
vmstat gives a good indication about the amount of disk IO (both regular
IO and swap), and top sorted by cpu or memory shows which processes are
responsible for the numbers.
You as a human can interpret what you see. I don't think software should
do anything with the load average, as it does not reflect the actual
usage of the hardware.
With kind regards, Sander
--
Humilis IT Services and Solutions
http://www.humilis.net
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2006-05-15 7:46 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <69c8K-3Bu-57@gated-at.bofh.it>
2006-05-05 23:12 ` High load average on disk I/O on 2.6.17-rc3 Robert Hancock
2006-05-06 4:39 ` Jason Schoonover
2006-05-06 17:20 ` Robert Hancock
2006-05-06 18:23 ` Jason Schoonover
2006-05-06 20:01 ` Robert Hancock
2006-05-05 17:10 Jason Schoonover
2006-05-06 23:03 ` bert hubert
2006-05-07 1:02 ` Jason Schoonover
2006-05-07 10:54 ` bert hubert
2006-05-07 16:50 ` Andrew Morton
2006-05-07 17:24 ` Jason Schoonover
2006-05-08 11:13 ` Erik Mouw
2006-05-08 11:22 ` Arjan van de Ven
2006-05-08 11:28 ` Russell King
2006-05-08 11:38 ` Avi Kivity
2006-05-08 12:37 ` Arjan van de Ven
2006-05-09 14:37 ` Bill Davidsen
2006-05-08 14:24 ` Martin J. Bligh
2006-05-08 14:55 ` Arjan van de Ven
2006-05-08 15:22 ` Erik Mouw
2006-05-08 15:25 ` Martin J. Bligh
2006-05-08 15:31 ` Arjan van de Ven
2006-05-08 15:42 ` Erik Mouw
2006-05-08 16:02 ` Martin J. Bligh
2006-05-08 16:02 ` Miquel van Smoorenburg
2006-05-08 16:47 ` Russell King
2006-05-08 17:04 ` Gabor Gombas
2006-05-08 17:18 ` Mike Galbraith
2006-05-09 1:57 ` Nick Piggin
2006-05-09 2:02 ` Martin Bligh
2006-05-09 2:16 ` Nick Piggin
2006-05-09 4:36 ` Arjan van de Ven
2006-05-09 4:46 ` Nick Piggin
2006-05-09 5:27 ` Hua Zhong
2006-05-09 5:03 ` David Lang
2006-05-15 7:46 ` Sander
2006-05-08 22:24 ` Bernd Eckenfels
2006-05-08 22:39 ` Lee Revell
2006-05-09 0:08 ` Peter Williams
2006-05-09 18:33 ` Bill Davidsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).