linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Bill Davidsen <davidsen@tmr.com>,
	linux-raid@vger.kernel.org, sparclinux@vger.kernel.org,
	iscsitarget-devel@lists.sourceforge.net
Subject: Re: [BUG] Raid1/5 over iSCSI trouble
Date: Sat, 20 Oct 2007 10:05:50 +0200	[thread overview]
Message-ID: <4719B6DE.9040403@systella.fr> (raw)
In-Reply-To: <1192828331.30976.26.camel@dwillia2-linux.ch.intel.com>

Dan Williams wrote:
> On Fri, 2007-10-19 at 14:04 -0700, BERTRAND Joël wrote:
>>         Sorry for this last mail. I have found another mistake, but I
>> don't
>> know if this bug comes from iscsi-target or raid5 itself. iSCSI target
>> is disconnected because istd1 and md_d0_raid5 kernel threads use 100%
>> of
>> CPU each !
>>
>> Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
>> Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Mem:   4139032k total,   218424k used,  3920608k free,    10136k
>> buffers
>> Swap:  7815536k total,        0k used,  7815536k free,    64808k
>> cached
>>
>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>
>>   5824 root      15  -5     0    0    0 R  100  0.0  10:34.25 istd1
>>
>>   5599 root      15  -5     0    0    0 R  100  0.0   7:25.43
>> md_d0_raid5

	When iSCSI works fine :

Tasks: 231 total,   2 running, 229 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  2.5%sy,  0.0%ni, 95.7%id,  0.1%wa,  0.0%hi,  1.5%si, 
0.0%st
Mem:   4139032k total,  4126064k used,    12968k free,    94680k buffers
Swap:  7815536k total,        0k used,  7815536k free,  3758776k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

  9774 root      15  -5     0    0    0 R   40  0.0   2:00.34 istd1 

  9738 root      15  -5     0    0    0 S    9  0.0   2:06.56 
md_d0_raid5
  4129 root      20   0 41648 5024 2432 S    6  0.1   2:46.39 
fail2ban-server
  9830 root      20   0  3248 1544 1120 R    1  0.0   0:00.18 top 

  4063 root      20   0  7424 5288  832 S    1  0.1   0:00.84 unfsd 

  9776 root      15  -5     0    0    0 D    1  0.0   0:00.82 istiod1 

  9780 root      15  -5     0    0    0 D    1  0.0   0:00.96 istiod1 

  9782 root      15  -5     0    0    0 D    1  0.0   0:01.10 istiod1 

     1 root      20   0  2576  960  816 S    0  0.0   0:01.56 init 

     2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd 

     3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 
migration/0

After a random time (iSCSI target is not disconnected but doesn't answer 
to initiator requests):

Tasks: 232 total,   5 running, 226 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.1%us,  7.9%sy,  0.0%ni, 91.6%id,  0.0%wa,  0.1%hi,  0.2%si, 
0.0%st
Mem:   4139032k total,  4125912k used,    13120k free,    95640k buffers
Swap:  7815536k total,        0k used,  7815536k free,  3758792k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

  9738 root      15  -5     0    0    0 R  100  0.0   3:56.57 
md_d0_raid5
  9739 root      15  -5     0    0    0 D   14  0.0   0:20.34 
md_d0_resync
  9845 root      20   0  3248 1544 1120 R    1  0.0   0:07.00 top 

  4129 root      20   0 41648 5024 2432 S    0  0.1   2:55.94 
fail2ban-server
     1 root      20   0  2576  960  816 S    0  0.0   0:01.58 init 

     2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd 

     3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 
migration/0
     4 root      15  -5     0    0    0 S    0  0.0   0:00.02 
ksoftirqd/0
     5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 
migration/1
     6 root      15  -5     0    0    0 S    0  0.0   0:00.00 
ksoftirqd/1

	You can see a very strange thing... When I have booted this server, 
md0_d0 was clean. When bug occurs, md_d0_resync is started (/dev/md/d0p1 
is a part of my raid1 array). Why ? This partition is not mounted on 
local server, only exported by iSCSI.

After disconnection of iSCSI target :

Tasks: 232 total,   7 running, 224 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us, 15.2%sy,  0.0%ni, 84.3%id,  0.0%wa,  0.1%hi,  0.3%si, 
0.0%st
Mem:   4139032k total,  4127584k used,    11448k free,    95752k buffers
Swap:  7815536k total,        0k used,  7815536k free,  3758792k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

  9738 root      15  -5     0    0    0 R  100  0.0   4:56.82 
md_d0_raid5
  9774 root      15  -5     0    0    0 R  100  0.0   5:52.41 istd1 

  9739 root      15  -5     0    0    0 R   14  0.0   0:28.90 
md_d0_resync
  9916 root      20   0  3248 1544 1120 R    2  0.0   0:00.56 top 

  4129 root      20   0 41648 5024 2432 S    0  0.1   2:56.17 
fail2ban-server
     1 root      20   0  2576  960  816 S    0  0.0   0:01.58 init 

     2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd 

     3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 
migration/0
     4 root      15  -5     0    0    0 S    0  0.0   0:00.02 
ksoftirqd/0
     5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 
migration/1
     6 root      15  -5     0    0    0 S    0  0.0   0:00.00 
ksoftirqd/1

> What is the output of:
> cat /proc/5824/wchan
> cat /proc/5599/wchan

Root poulenc:[/usr/scripts] > cat /proc/9738/wchan
_startRoot poulenc:[/usr/scripts] > cat /proc/9774/wchan
_startRoot poulenc:[/usr/scripts] > vmstat -a
procs -----------memory---------- ---swap-- -----io---- -system-- 
----cpu----
  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy 
id wa
  5  0      0  10824 3777528 112280    0    0     7    19   12   19  0 
0 100  0
Root poulenc:[/usr/scripts] > vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- 
----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy 
id wa
  5  0      0  10928  95856 3756880    0    0     7    19   12   19  0 
0 100  0
Root poulenc:[/usr/scripts] >  vmstat -s
       4139032 K total memory
       4127864 K used memory
        112216 K active memory
       3777568 K inactive memory
         11168 K free memory
         95928 K buffer memory
       3756896 K swap cache
       7815536 K total swap
             0 K used swap
       7815536 K free swap
         26901 non-nice user cpu ticks
           824 nice user cpu ticks
        204746 system cpu ticks
      94245668 idle cpu ticks
         14378 IO-wait cpu ticks
          3086 IRQ cpu ticks
         33971 softirq cpu ticks
             0 stolen cpu ticks
       6555730 pages paged in
      18136571 pages paged out
             0 pages swapped in
             0 pages swapped out
      11259263 interrupts
      18167358 CPU context switches
    1192827483 boot time
          9962 forks
Root poulenc:[/usr/scripts] > vmstat -d
disk- ------------reads------------ ------------writes----------- 
-----IO------
        total merged sectors      ms  total merged sectors      ms 
cur    sec
sda   716720 143247 94849012 2617628   6732  24789  269070  222236 
0    532
sdb   103590  23780 6140736   85244 409226 308936 88160014 13352564 
  0    929
md0    17469      0  456250       0   4557      0   36456       0      0 
      0
sdc   265108 2103743 37883308 2810656 266586 272237 8767696  628236 
  0    825
sdd   266248 2099943 37844236 2801400 264081 275321 8781088  609140 
  0    824
sde   263660 2104487 37875132 2835548 262296 276561 8776000  595140 
  0    826
sdf   283262 2084095 37862108 2432988 262197 277305 8785600  581008 
  0    779
sdg   285205 2082611 37870324 2291464 260836 278822 8791456  567908 
  0    752
sdh   291773 2072874 37817788 1892320 260572 278182 8775472  550688 
  0    685
loop0      0      0       0       0      0      0       0       0      0 
      0
loop1      0      0       0       0      0      0       0       0      0 
      0
loop2      0      0       0       0      0      0       0       0      0 
      0
loop3      0      0       0       0      0      0       0       0      0 
      0
loop4      0      0       0       0      0      0       0       0      0 
      0
loop5      0      0       0       0      0      0       0       0      0 
      0
loop6      0      0       0       0      0      0       0       0      0 
      0
loop7      0      0       0       0      0      0       0       0      0 
      0
md6       31      0     496       0      0      0       0       0      0 
      0
md1     4326      0  161366       0     27      0     110       0      0 
      0
md2   206279      0 4713706       0  14670      0  118752       0      0 
      0
md3     6709      0  392442       0   9964      0   80040       0      0 
      0
disk- ------------reads------------ ------------writes----------- 
-----IO------
        total merged sectors      ms  total merged sectors      ms 
cur    sec
md4      247      0    3746       0    131      0    1208       0      0 
      0
md5    63245      0 7365546       0    292      0    2424       0      0 
      0
md_d0     14      0     216       0 642029      0 36004104       0 
0      0
Root poulenc:[/usr/scripts] >

	Please note that zombies process are not signifiant for this server. It 
runs watchdog and zombies process counter is allways between 0 and 2.

	When iSCSI target hangs, load average is :
load average: 14.03, 13.63, 10.47 with only md_d0_raid5, istd1 and 
md_d0_resync running process.

  9774 root      15  -5     0    0    0 R  100  0.0  18:17.63 istd1 

  9738 root      15  -5     0    0    0 R  100  0.0  17:22.04 
md_d0_raid5
  9739 root      15  -5     0    0    0 R   14  0.0   2:15.18 
md_d0_resync

	I won't reboot this server if you need some other information.

	Regards,

	JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2007-10-20  8:05 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-16 13:24 [BUG] Raid5 trouble BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:58   ` Dan Williams
2007-10-17 15:40     ` Dan Williams
2007-10-17 16:44       ` BERTRAND Joël
2007-10-18  0:46         ` Dan Williams
2007-10-18  8:29           ` BERTRAND Joël
2007-10-19  2:55       ` Bill Davidsen
2007-10-19  8:04         ` BERTRAND Joël
2007-10-19 15:51           ` Dan Williams
2007-10-19 16:03             ` BERTRAND Joël
     [not found]             ` <4718DE66.8000905@tmr.com>
2007-10-19 20:42               ` BERTRAND Joël
2007-10-19 20:49                 ` [BUG] Raid1/5 over iSCSI trouble BERTRAND Joël
2007-10-19 21:02                   ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:06                     ` BERTRAND Joël
2007-10-19 21:10                       ` Ross S. W. Walker
2007-10-20  7:45                         ` BERTRAND Joël
2007-10-19 21:11                       ` [Iscsitarget-devel] " Scott Kaelin
2007-10-19 21:04                   ` BERTRAND Joël
2007-10-19 21:08                     ` Ross S. W. Walker
2007-10-19 21:12                     ` Dan Williams
2007-10-20  8:05                       ` BERTRAND Joël [this message]
2007-10-24  7:12                         ` BERTRAND Joël
2007-10-24 20:10                           ` Bill Davidsen
2007-10-24 23:49                           ` Dan Williams
2007-10-25  0:03                             ` David Miller
2007-10-27 13:29                             ` BERTRAND Joël
2007-10-27 18:27                               ` Dan Williams
2007-10-27 19:35                                 ` BERTRAND Joël
2007-10-27 21:13                               ` Ming Zhang
2007-10-29 10:40                                 ` BERTRAND Joël
2007-10-19 21:19                     ` Ming Zhang
2007-10-19 23:50                     ` Bill Davidsen
2007-10-19 23:58                       ` Bill Davidsen
2007-10-20  7:52                       ` BERTRAND Joël
2007-10-17 16:07     ` [BUG] Raid5 trouble BERTRAND Joël

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4719B6DE.9040403@systella.fr \
    --to=joel.bertrand@systella.fr \
    --cc=dan.j.williams@intel.com \
    --cc=davidsen@tmr.com \
    --cc=iscsitarget-devel@lists.sourceforge.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).