From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Bill Davidsen <davidsen@tmr.com>,
linux-raid@vger.kernel.org, sparclinux@vger.kernel.org,
iscsitarget-devel@lists.sourceforge.net
Subject: Re: [BUG] Raid1/5 over iSCSI trouble
Date: Sat, 20 Oct 2007 10:05:50 +0200 [thread overview]
Message-ID: <4719B6DE.9040403@systella.fr> (raw)
In-Reply-To: <1192828331.30976.26.camel@dwillia2-linux.ch.intel.com>
Dan Williams wrote:
> On Fri, 2007-10-19 at 14:04 -0700, BERTRAND Joël wrote:
>> Sorry for this last mail. I have found another mistake, but I
>> don't
>> know if this bug comes from iscsi-target or raid5 itself. iSCSI target
>> is disconnected because istd1 and md_d0_raid5 kernel threads use 100%
>> of
>> CPU each !
>>
>> Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie
>> Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si,
>> 0.0%st
>> Mem: 4139032k total, 218424k used, 3920608k free, 10136k
>> buffers
>> Swap: 7815536k total, 0k used, 7815536k free, 64808k
>> cached
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>
>> 5824 root 15 -5 0 0 0 R 100 0.0 10:34.25 istd1
>>
>> 5599 root 15 -5 0 0 0 R 100 0.0 7:25.43
>> md_d0_raid5
When iSCSI works fine :
Tasks: 231 total, 2 running, 229 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 2.5%sy, 0.0%ni, 95.7%id, 0.1%wa, 0.0%hi, 1.5%si,
0.0%st
Mem: 4139032k total, 4126064k used, 12968k free, 94680k buffers
Swap: 7815536k total, 0k used, 7815536k free, 3758776k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9774 root 15 -5 0 0 0 R 40 0.0 2:00.34 istd1
9738 root 15 -5 0 0 0 S 9 0.0 2:06.56
md_d0_raid5
4129 root 20 0 41648 5024 2432 S 6 0.1 2:46.39
fail2ban-server
9830 root 20 0 3248 1544 1120 R 1 0.0 0:00.18 top
4063 root 20 0 7424 5288 832 S 1 0.1 0:00.84 unfsd
9776 root 15 -5 0 0 0 D 1 0.0 0:00.82 istiod1
9780 root 15 -5 0 0 0 D 1 0.0 0:00.96 istiod1
9782 root 15 -5 0 0 0 D 1 0.0 0:01.10 istiod1
1 root 20 0 2576 960 816 S 0 0.0 0:01.56 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.00
migration/0
After a random time (iSCSI target is not disconnected but doesn't answer
to initiator requests):
Tasks: 232 total, 5 running, 226 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.1%us, 7.9%sy, 0.0%ni, 91.6%id, 0.0%wa, 0.1%hi, 0.2%si,
0.0%st
Mem: 4139032k total, 4125912k used, 13120k free, 95640k buffers
Swap: 7815536k total, 0k used, 7815536k free, 3758792k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9738 root 15 -5 0 0 0 R 100 0.0 3:56.57
md_d0_raid5
9739 root 15 -5 0 0 0 D 14 0.0 0:20.34
md_d0_resync
9845 root 20 0 3248 1544 1120 R 1 0.0 0:07.00 top
4129 root 20 0 41648 5024 2432 S 0 0.1 2:55.94
fail2ban-server
1 root 20 0 2576 960 816 S 0 0.0 0:01.58 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.00
migration/0
4 root 15 -5 0 0 0 S 0 0.0 0:00.02
ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:00.00
migration/1
6 root 15 -5 0 0 0 S 0 0.0 0:00.00
ksoftirqd/1
You can see a very strange thing... When I have booted this server,
md0_d0 was clean. When bug occurs, md_d0_resync is started (/dev/md/d0p1
is a part of my raid1 array). Why ? This partition is not mounted on
local server, only exported by iSCSI.
After disconnection of iSCSI target :
Tasks: 232 total, 7 running, 224 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.0%us, 15.2%sy, 0.0%ni, 84.3%id, 0.0%wa, 0.1%hi, 0.3%si,
0.0%st
Mem: 4139032k total, 4127584k used, 11448k free, 95752k buffers
Swap: 7815536k total, 0k used, 7815536k free, 3758792k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9738 root 15 -5 0 0 0 R 100 0.0 4:56.82
md_d0_raid5
9774 root 15 -5 0 0 0 R 100 0.0 5:52.41 istd1
9739 root 15 -5 0 0 0 R 14 0.0 0:28.90
md_d0_resync
9916 root 20 0 3248 1544 1120 R 2 0.0 0:00.56 top
4129 root 20 0 41648 5024 2432 S 0 0.1 2:56.17
fail2ban-server
1 root 20 0 2576 960 816 S 0 0.0 0:01.58 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.00
migration/0
4 root 15 -5 0 0 0 S 0 0.0 0:00.02
ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:00.00
migration/1
6 root 15 -5 0 0 0 S 0 0.0 0:00.00
ksoftirqd/1
> What is the output of:
> cat /proc/5824/wchan
> cat /proc/5599/wchan
Root poulenc:[/usr/scripts] > cat /proc/9738/wchan
_startRoot poulenc:[/usr/scripts] > cat /proc/9774/wchan
_startRoot poulenc:[/usr/scripts] > vmstat -a
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free inact active si so bi bo in cs us sy
id wa
5 0 0 10824 3777528 112280 0 0 7 19 12 19 0
0 100 0
Root poulenc:[/usr/scripts] > vmstat
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
5 0 0 10928 95856 3756880 0 0 7 19 12 19 0
0 100 0
Root poulenc:[/usr/scripts] > vmstat -s
4139032 K total memory
4127864 K used memory
112216 K active memory
3777568 K inactive memory
11168 K free memory
95928 K buffer memory
3756896 K swap cache
7815536 K total swap
0 K used swap
7815536 K free swap
26901 non-nice user cpu ticks
824 nice user cpu ticks
204746 system cpu ticks
94245668 idle cpu ticks
14378 IO-wait cpu ticks
3086 IRQ cpu ticks
33971 softirq cpu ticks
0 stolen cpu ticks
6555730 pages paged in
18136571 pages paged out
0 pages swapped in
0 pages swapped out
11259263 interrupts
18167358 CPU context switches
1192827483 boot time
9962 forks
Root poulenc:[/usr/scripts] > vmstat -d
disk- ------------reads------------ ------------writes-----------
-----IO------
total merged sectors ms total merged sectors ms
cur sec
sda 716720 143247 94849012 2617628 6732 24789 269070 222236
0 532
sdb 103590 23780 6140736 85244 409226 308936 88160014 13352564
0 929
md0 17469 0 456250 0 4557 0 36456 0 0
0
sdc 265108 2103743 37883308 2810656 266586 272237 8767696 628236
0 825
sdd 266248 2099943 37844236 2801400 264081 275321 8781088 609140
0 824
sde 263660 2104487 37875132 2835548 262296 276561 8776000 595140
0 826
sdf 283262 2084095 37862108 2432988 262197 277305 8785600 581008
0 779
sdg 285205 2082611 37870324 2291464 260836 278822 8791456 567908
0 752
sdh 291773 2072874 37817788 1892320 260572 278182 8775472 550688
0 685
loop0 0 0 0 0 0 0 0 0 0
0
loop1 0 0 0 0 0 0 0 0 0
0
loop2 0 0 0 0 0 0 0 0 0
0
loop3 0 0 0 0 0 0 0 0 0
0
loop4 0 0 0 0 0 0 0 0 0
0
loop5 0 0 0 0 0 0 0 0 0
0
loop6 0 0 0 0 0 0 0 0 0
0
loop7 0 0 0 0 0 0 0 0 0
0
md6 31 0 496 0 0 0 0 0 0
0
md1 4326 0 161366 0 27 0 110 0 0
0
md2 206279 0 4713706 0 14670 0 118752 0 0
0
md3 6709 0 392442 0 9964 0 80040 0 0
0
disk- ------------reads------------ ------------writes-----------
-----IO------
total merged sectors ms total merged sectors ms
cur sec
md4 247 0 3746 0 131 0 1208 0 0
0
md5 63245 0 7365546 0 292 0 2424 0 0
0
md_d0 14 0 216 0 642029 0 36004104 0
0 0
Root poulenc:[/usr/scripts] >
Please note that zombies process are not signifiant for this server. It
runs watchdog and zombies process counter is allways between 0 and 2.
When iSCSI target hangs, load average is :
load average: 14.03, 13.63, 10.47 with only md_d0_raid5, istd1 and
md_d0_resync running process.
9774 root 15 -5 0 0 0 R 100 0.0 18:17.63 istd1
9738 root 15 -5 0 0 0 R 100 0.0 17:22.04
md_d0_raid5
9739 root 15 -5 0 0 0 R 14 0.0 2:15.18
md_d0_resync
I won't reboot this server if you need some other information.
Regards,
JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2007-10-20 8:05 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-16 13:24 [BUG] Raid5 trouble BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:58 ` Dan Williams
2007-10-17 15:40 ` Dan Williams
2007-10-17 16:44 ` BERTRAND Joël
2007-10-18 0:46 ` Dan Williams
2007-10-18 8:29 ` BERTRAND Joël
2007-10-19 2:55 ` Bill Davidsen
2007-10-19 8:04 ` BERTRAND Joël
2007-10-19 15:51 ` Dan Williams
2007-10-19 16:03 ` BERTRAND Joël
[not found] ` <4718DE66.8000905@tmr.com>
2007-10-19 20:42 ` BERTRAND Joël
2007-10-19 20:49 ` [BUG] Raid1/5 over iSCSI trouble BERTRAND Joël
2007-10-19 21:02 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:06 ` BERTRAND Joël
2007-10-19 21:10 ` Ross S. W. Walker
2007-10-20 7:45 ` BERTRAND Joël
2007-10-19 21:11 ` [Iscsitarget-devel] " Scott Kaelin
2007-10-19 21:04 ` BERTRAND Joël
2007-10-19 21:08 ` Ross S. W. Walker
2007-10-19 21:12 ` Dan Williams
2007-10-20 8:05 ` BERTRAND Joël [this message]
2007-10-24 7:12 ` BERTRAND Joël
2007-10-24 20:10 ` Bill Davidsen
2007-10-24 23:49 ` Dan Williams
2007-10-25 0:03 ` David Miller
2007-10-27 13:29 ` BERTRAND Joël
2007-10-27 18:27 ` Dan Williams
2007-10-27 19:35 ` BERTRAND Joël
2007-10-27 21:13 ` Ming Zhang
2007-10-29 10:40 ` BERTRAND Joël
2007-10-19 21:19 ` Ming Zhang
2007-10-19 23:50 ` Bill Davidsen
2007-10-19 23:58 ` Bill Davidsen
2007-10-20 7:52 ` BERTRAND Joël
2007-10-17 16:07 ` [BUG] Raid5 trouble BERTRAND Joël
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4719B6DE.9040403@systella.fr \
--to=joel.bertrand@systella.fr \
--cc=dan.j.williams@intel.com \
--cc=davidsen@tmr.com \
--cc=iscsitarget-devel@lists.sourceforge.net \
--cc=linux-raid@vger.kernel.org \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).