* Strange CPU occupation...
@ 2007-10-31 12:38 BERTRAND Joël
2007-11-01 9:25 ` Strange CPU occupation... and system hangs BERTRAND Joël
2007-11-07 8:55 ` Strange CPU occupation Goswin von Brederlow
0 siblings, 2 replies; 3+ messages in thread
From: BERTRAND Joël @ 2007-10-31 12:38 UTC (permalink / raw)
To: sparclinux, linux-raid, iscsitarget-devel
Hello,
I'm looking for a bug in iSCSI target code, but I have found this
morning a new bug that is certainly related to mine...
Please consider these raid volumes:
Root gershwin:[/etc] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2](F) md_d0p1[0]
1464725632 blocks [2/1] [U_]
md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
md6 : active raid1 sda1[0] sdb1[1]
7815552 blocks [2/2] [UU]
md5 : active raid1 sda8[0] sdb8[1]
14538752 blocks [2/2] [UU]
md4 : active raid1 sda7[0] sdb7[1]
4883648 blocks [2/2] [UU]
md3 : active raid1 sda6[0] sdb6[1]
9767424 blocks [2/2] [UU]
md2 : active raid1 sda5[0] sdb5[1]
29294400 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
489856 blocks [2/2] [UU]
md0 : active raid1 sdb4[1] sda4[0]
4883648 blocks [2/2] [UU]
unused devices: <none>
Root gershwin:[/etc] >
md7 only has one disk because I cannot synchronize it over iSCSI. But
without any message, load average of this server (24 threads T1000)
increases until more than 9. top returns :
top - 13:36:08 up 4 days, 1:00, 3 users, load average: 9.23, 8.46, 6.26
Tasks: 252 total, 5 running, 246 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.0%us, 4.2%sy, 0.0%ni, 87.4%id, 8.4%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 4139024k total, 4115920k used, 23104k free, 743976k buffers
Swap: 7815536k total, 304k used, 7815232k free, 2188048k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5426 root 15 -5 0 0 0 R 100 0.0 46:32.54
md_d0_raid5
17215 root 20 0 3120 1552 1112 R 1 0.0 0:01.38 top
1 root 20 0 2576 960 816 S 0 0.0 0:09.74 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.18
migration/0
4 root 15 -5 0 0 0 S 0 0.0 0:00.18
ksoftirqd/0
and some process are in D state :
Root gershwin:[/etc] > ps auwx | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 270 0.0 0.0 0 0 ? D Oct27 1:17 [pdflush]
root 3676 0.9 0.0 0 0 ? D Oct27 56:03 [nfsd]
root 5435 0.0 0.0 0 0 ? D< Oct27 3:16 [md7_raid1]
root 5438 0.0 0.0 0 0 ? D< Oct27 1:01 [kjournald]
root 5440 0.0 0.0 0 0 ? D< Oct27 0:33 [loop0]
root 5441 0.0 0.0 0 0 ? D< Oct27 0:05 [kjournald]
root 16442 0.0 0.0 20032 1208 pts/2 D+ 13:23 0:00 iftop
-i eth2
Why md7_raid is in D state ? Same question about iftop ?
Regards,
JKB
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Strange CPU occupation... and system hangs
2007-10-31 12:38 Strange CPU occupation BERTRAND Joël
@ 2007-11-01 9:25 ` BERTRAND Joël
2007-11-07 8:55 ` Strange CPU occupation Goswin von Brederlow
1 sibling, 0 replies; 3+ messages in thread
From: BERTRAND Joël @ 2007-11-01 9:25 UTC (permalink / raw)
To: sparclinux, linux-raid, iscsitarget-devel
BERTRAND Joël wrote:
<snip>
> and some process are in D state :
> Root gershwin:[/etc] > ps auwx | grep D
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 270 0.0 0.0 0 0 ? D Oct27 1:17 [pdflush]
> root 3676 0.9 0.0 0 0 ? D Oct27 56:03 [nfsd]
> root 5435 0.0 0.0 0 0 ? D< Oct27 3:16 [md7_raid1]
> root 5438 0.0 0.0 0 0 ? D< Oct27 1:01 [kjournald]
> root 5440 0.0 0.0 0 0 ? D< Oct27 0:33 [loop0]
> root 5441 0.0 0.0 0 0 ? D< Oct27 0:05 [kjournald]
> root 16442 0.0 0.0 20032 1208 pts/2 D+ 13:23 0:00 iftop
> -i eth2
>
> Why md7_raid is in D state ? Same question about iftop ?
Some bad news... After ten or eleven hours, kernel crashes on this
server. The last top screen is :
top - 04:59:46 up 4 days, 16:24, 3 users, load average: 19.72, 19.22,
19.05
Tasks: 285 total, 5 running, 279 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.0%us, 4.2%sy, 0.0%ni, 68.5%id, 27.3%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 4139024k total, 4130800k used, 8224k free, 38984k buffers
Swap: 7815536k total, 304k used, 7815232k free, 79056k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5426 root 15 -5 0 0 0 R 100 0.0 970:17.21 md_d0_raid5
26923 root 20 0 3120 1568 1112 R 2 0.0 13:32.24 top
...
I have rebooted. I don't have any message in log files. I don't have
any screen but I haven't seen anything on serial console. In ker.log, I
have :
Oct 31 15:36:15 gershwin kernel: swapper: page allocation failure.
order:2, mode:0x4020
Oct 31 15:36:15 gershwin kernel: Call Trace:
Oct 31 15:36:15 gershwin kernel: [00000000004b6568]
__slab_alloc+0x1b0/0x720
Oct 31 15:36:15 gershwin kernel: [00000000004b87a8]
__kmalloc_track_caller+0xb0/0xe0
Oct 31 15:36:15 gershwin kernel: [0000000000601d68] __alloc_skb+0x50/0x120
Oct 31 15:36:15 gershwin kernel: [0000000000642ee0]
tcp_collapse+0x1e8/0x440
Oct 31 15:36:15 gershwin kernel: [0000000000643298]
tcp_prune_queue+0x160/0x3a0
Oct 31 15:36:15 gershwin kernel: [0000000000643d08]
tcp_data_queue+0x830/0xde0
Oct 31 15:36:15 gershwin kernel: [0000000000645d74]
tcp_rcv_established+0x35c/0x840
Oct 31 15:36:15 gershwin kernel: [000000000064cf7c]
tcp_v4_do_rcv+0xe4/0x4a0
Oct 31 15:36:15 gershwin kernel: [000000000064fdd8] tcp_v4_rcv+0xb00/0xb20
Oct 31 15:36:15 gershwin kernel: [000000000062e2ac]
ip_local_deliver+0x194/0x3a0
Oct 31 15:36:15 gershwin kernel: [000000000062dd98] ip_rcv+0x360/0x6e0
Oct 31 15:36:15 gershwin kernel: [0000000000607f64]
netif_receive_skb+0x1ec/0x480
Oct 31 15:36:15 gershwin kernel: [00000000005a5fe0] tg3_poll+0x6c8/0xc40
Oct 31 15:36:15 gershwin kernel: [000000000060a940]
net_rx_action+0x88/0x160
Oct 31 15:36:15 gershwin kernel: [0000000000468078] __do_softirq+0x80/0x100
Oct 31 15:36:15 gershwin kernel: [000000000046815c] do_softirq+0x64/0x80
Oct 31 15:36:15 gershwin kernel: Mem-info:
Oct 31 15:36:15 gershwin kernel: Normal per-cpu:
Oct 31 15:36:15 gershwin kernel: CPU 0: Hot: hi: 90, btch: 15 usd:
15 Cold: hi: 30, btch: 7 usd: 5
Oct 31 15:36:15 gershwin kernel: CPU 1: Hot: hi: 90, btch: 15 usd:
31 Cold: hi: 30, btch: 7 usd: 4
Oct 31 15:36:15 gershwin kernel: CPU 2: Hot: hi: 90, btch: 15 usd:
4 Cold: hi: 30, btch: 7 usd: 3
Oct 31 15:36:15 gershwin kernel: CPU 3: Hot: hi: 90, btch: 15 usd:
82 Cold: hi: 30, btch: 7 usd: 2
Oct 31 15:36:15 gershwin kernel: CPU 4: Hot: hi: 90, btch: 15 usd:
84 Cold: hi: 30, btch: 7 usd: 0
Oct 31 15:36:15 gershwin kernel: CPU 5: Hot: hi: 90, btch: 15 usd:
65 Cold: hi: 30, btch: 7 usd: 4
Oct 31 15:36:15 gershwin kernel: CPU 6: Hot: hi: 90, btch: 15 usd:
85 Cold: hi: 30, btch: 7 usd: 6
Oct 31 15:36:15 gershwin kernel: CPU 7: Hot: hi: 90, btch: 15 usd:
69 Cold: hi: 30, btch: 7 usd: 4
Oct 31 15:36:15 gershwin kernel: CPU 8: Hot: hi: 90, btch: 15 usd:
11 Cold: hi: 30, btch: 7 usd: 5
Oct 31 15:36:15 gershwin kernel: CPU 9: Hot: hi: 90, btch: 15 usd:
75 Cold: hi: 30, btch: 7 usd: 1
Oct 31 15:36:15 gershwin kernel: CPU 10: Hot: hi: 90, btch: 15 usd:
84 Cold: hi: 30, btch: 7 usd: 2
Oct 31 15:36:15 gershwin kernel: CPU 11: Hot: hi: 90, btch: 15 usd:
13 Cold: hi: 30, btch: 7 usd: 1
Oct 31 15:36:15 gershwin kernel: CPU 12: Hot: hi: 90, btch: 15 usd:
17 Cold: hi: 30, btch: 7 usd: 23
Oct 31 15:36:15 gershwin kernel: CPU 13: Hot: hi: 90, btch: 15 usd:
7 Cold: hi: 30, btch: 7 usd: 25
Oct 31 15:36:15 gershwin kernel: CPU 14: Hot: hi: 90, btch: 15 usd:
64 Cold: hi: 30, btch: 7 usd: 27
Oct 31 15:36:15 gershwin kernel: CPU 15: Hot: hi: 90, btch: 15 usd:
12 Cold: hi: 30, btch: 7 usd: 6
Oct 31 15:36:15 gershwin kernel: CPU 16: Hot: hi: 90, btch: 15 usd:
2 Cold: hi: 30, btch: 7 usd: 1
Oct 31 15:36:15 gershwin kernel: CPU 17: Hot: hi: 90, btch: 15 usd:
80 Cold: hi: 30, btch: 7 usd: 1
Oct 31 15:36:15 gershwin kernel: CPU 18: Hot: hi: 90, btch: 15 usd:
4 Cold: hi: 30, btch: 7 usd: 17
Oct 31 15:36:15 gershwin kernel: CPU 19: Hot: hi: 90, btch: 15 usd:
58 Cold: hi: 30, btch: 7 usd: 1
Oct 31 15:36:16 gershwin kernel: CPU 20: Hot: hi: 90, btch: 15 usd:
13 Cold: hi: 30, btch: 7 usd: 4
Oct 31 15:36:16 gershwin kernel: CPU 21: Hot: hi: 90, btch: 15 usd:
87 Cold: hi: 30, btch: 7 usd: 2
Oct 31 15:36:16 gershwin kernel: CPU 22: Hot: hi: 90, btch: 15 usd:
77 Cold: hi: 30, btch: 7 usd: 6
Oct 31 15:36:16 gershwin kernel: CPU 23: Hot: hi: 90, btch: 15 usd:
10 Cold: hi: 30, btch: 7 usd: 2
Oct 31 15:36:16 gershwin kernel: Active:72131 inactive:275943 dirty:6433
writeback:1041 unstable:0
Oct 31 15:36:16 gershwin kernel: free:2187 slab:131941 mapped:1732
pagetables:258 bounce:0
Oct 31 15:36:16 gershwin kernel: Normal free:17496kB min:8144kB
low:10176kB high:12216kB active:577048kB inactive:2207544kB
present:4149440kB pages_scanned:0 all_unreclaimable? no
Oct 31 15:36:16 gershwin kernel: lowmem_reserve[]: 0 0
Oct 31 15:36:16 gershwin kernel: Normal: 1786*8kB 173*16kB 0*32kB 1*64kB
0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 17632kB
Oct 31 15:36:16 gershwin kernel: Swap cache: add 39, delete 39, find
0/0, race 0+0
Oct 31 15:36:16 gershwin kernel: Free swap = 7815232kB
Oct 31 15:36:16 gershwin kernel: Free swap = 7815232kB
Oct 31 15:36:16 gershwin kernel: Total swap = 7815536kB
Oct 31 15:36:16 gershwin kernel: Free swap: 7815232kB
Oct 31 15:36:17 gershwin kernel: 524260 pages of RAM
Oct 31 15:36:17 gershwin kernel: 6882 reserved pages
Oct 31 15:36:17 gershwin kernel: 278965 pages shared
Oct 31 15:36:17 gershwin kernel: 0 pages swap cached
Oct 31 15:36:17 gershwin kernel: 6435 pages dirty
Oct 31 15:36:17 gershwin kernel: 1041 pages writeback
Oct 31 15:36:17 gershwin kernel: 1732 pages mapped
Oct 31 15:36:17 gershwin kernel: 131941 pages slab
Oct 31 15:36:17 gershwin kernel: 258 pages pagetables
Any idea ?
Regards,
JKB
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Strange CPU occupation...
2007-10-31 12:38 Strange CPU occupation BERTRAND Joël
2007-11-01 9:25 ` Strange CPU occupation... and system hangs BERTRAND Joël
@ 2007-11-07 8:55 ` Goswin von Brederlow
1 sibling, 0 replies; 3+ messages in thread
From: Goswin von Brederlow @ 2007-11-07 8:55 UTC (permalink / raw)
To: BERTRAND Joel; +Cc: sparclinux, linux-raid, iscsitarget-devel
BERTRAND Joël <joel.bertrand@systella.fr> writes:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>
> 5426 root 15 -5 0 0 0 R 100 0.0 46:32.54
> md_d0_raid5
First: You can tune the stripe cache.
Secondly: You might want the raid speedup patches that implement
prioritized queueing. That reduces the number of incomplete stripes
written to disk and thus speeds up things a lot.
Thirdly: If you have multiple cores then you probably want patches to
multithread the raid5 code. With just one thread it is ultimately cpu
bound and just won't get faster. XORing and maintainance overhead
seems to be awfully cpu intensive.
Fourth: Learn to live with it. Software Raid5/6 is expensive.
> and some process are in D state :
> Root gershwin:[/etc] > ps auwx | grep D
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 270 0.0 0.0 0 0 ? D Oct27 1:17 [pdflush]
> root 3676 0.9 0.0 0 0 ? D Oct27 56:03 [nfsd]
> root 5435 0.0 0.0 0 0 ? D< Oct27 3:16 [md7_raid1]
> root 5438 0.0 0.0 0 0 ? D< Oct27 1:01 [kjournald]
> root 5440 0.0 0.0 0 0 ? D< Oct27 0:33 [loop0]
> root 5441 0.0 0.0 0 0 ? D< Oct27 0:05 [kjournald]
> root 16442 0.0 0.0 20032 1208 pts/2 D+ 13:23 0:00 iftop
> -i eth2
>
> Why md7_raid is in D state ? Same question about iftop ?
I would say it is just waiting for the underlying devices to finish
some I/O request. And iftop will be in some kernel call waiting for
something or other.
MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-11-07 8:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-31 12:38 Strange CPU occupation BERTRAND Joël
2007-11-01 9:25 ` Strange CPU occupation... and system hangs BERTRAND Joël
2007-11-07 8:55 ` Strange CPU occupation Goswin von Brederlow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).