How to force raid to stop, cleanly? More info

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* How to force raid to stop, cleanly? More info
@ 2003-04-08 22:29 Stephen C Woods
  2003-04-09  1:11 ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen C Woods @ 2003-04-08 22:29 UTC (permalink / raw)
  To: Linux-raid maillist


>   We're running 2.4.20,  we've come across a situation where the 
>raid and overlying LVM seems to be stuck.  No I/O is occuring, processes
>trying to access to raid or overlying volumes hang and can't be
>terminated.
>
>   What we'd like to do is to force the raid volume to terminate
>and sync its disks and superblocks, so that the raid volume is
>or thinks it is consistant.
>
> The overlying JFS will clean up when it restarts, but we don't want to
>wait 3 days for the RAID to finish recomputing parity (these are some really
>large arrays).

  Here's a little more info. 
(1)The scsi disks are responding correctly,  I can read from each drive.  
(2) More processes are getting stuck, lilo, sync and mkdir
(3) There are 2 processes stuck on rwsem_down_failed both related to
    the raid/lvm.
(3) mdadm thinks the everything is just fine.

Current ps output.
  PID STAT %CPU  WCHAN WIDE-WCHAN-COLUMN COMMAND,wchan=WIDE-WCHAN-COLUMN -o arg 
    1 S     0.0 14af8d do_select         init [3] 
    2 SW    0.0 127efd context_thread    [keventd]
    3 SW    0.0 11566a apm_mainloop      [kapmd]
    4 SWN   0.0 11f30e ksoftirqd         [ksoftirqd_CPU0]
    5 SW    0.1 133806 kswapd            [kswapd]
    6 SW    0.0 13fe2a bdflush           [bdflush]
    7 DW    0.1 a2a048 end               [kupdated]
    8 SW<   0.0 1ebfb5 md_thread         [mdrecoveryd]
   12 SW    0.0 81cb6c end               [kjournald]
   68 SW    0.0 83d52e end               [khubd]
  168 SW    0.0 81cb6c end               [kjournald]
  169 SW    0.0 81cb6c end               [kjournald]
  170 SW    0.0 81cb6c end               [kjournald]
  171 SW    0.0 81cb6c end               [kjournald]
  465 S     0.0 14af8d do_select         syslogd -m 0
  469 S     0.0 11a3d1 do_syslog         klogd -x
  486 S     0.0 14b716 do_poll           portmap
  505 S     0.0 14af8d do_select         rpc.statd
  586 S     0.0 14af8d do_select         /usr/sbin/apmd -p 10 -w 5 -W -P /etc/sy
  602 S     0.0 14b716 do_poll           ypserv
  614 S     0.0 14b716 do_poll           ypbind
  661 S     0.0 14af8d do_select         /usr/sbin/sshd
  675 S     0.0 14af8d do_select         xinetd -stayalive -reuse -pidfile /var/
  686 SL    0.0 14af8d do_select         ntpd -U ntp -g
  705 S     0.0 14b716 do_poll           rpc.rquotad
  709 SW    0.0 95bf32 end               [nfsd]
  710 SW    0.0 95bf32 end               [nfsd]
  711 SW    0.0 95bf32 end               [nfsd]
  712 SW    0.0 95bf32 end               [lockd]
  713 SW    0.0 95845e end               [rpciod]
  714 SW    0.0 95bf32 end               [nfsd]
  715 SW    0.0 95bf32 end               [nfsd]
  716 SW    0.0 95bf32 end               [nfsd]
  717 SW    0.0 95bf32 end               [nfsd]
  718 SW    0.0 95bf32 end               [nfsd]
  724 S     0.0 14b716 do_poll           rpc.mountd
  733 S     0.0 14af8d do_select         gpm -t ps/2 -m /dev/mouse
  742 S     0.0 1231e3 nanosleep         crond
  771 S     0.0 14af8d do_select         xfs -droppriv -daemon
  789 S     0.0 1231e3 nanosleep         /usr/sbin/atd
  802 S     0.0 1231e3 nanosleep         rhnsd --interval 120
  806 S     0.0 177e9d read_chan         /sbin/mingetty tty1
  807 S     0.0 177e9d read_chan         /sbin/mingetty tty2
  808 S     0.0 177e9d read_chan         /sbin/mingetty tty3
  809 S     0.0 177e9d read_chan         /sbin/mingetty tty4
  810 S     0.0 177e9d read_chan         /sbin/mingetty tty5
  811 S     0.0 177e9d read_chan         /sbin/mingetty tty6
  856 SW<   1.5 1ebfb5 md_thread         [raid5d]
  857 SWN   0.1 1ebfb5 md_thread         [raid5syncd]
  882 SW    0.0 a297cb end               [jfsIO]
  883 DW    0.0 24aa8c rwsem_down_failed [jfsCommit]
  884 SW    0.0 a2d7cd end               [jfsSync]
 1669 SW<   2.2 1ebfb5 md_thread         [raid5d]
 1670 SWN   0.5 1ebfb5 md_thread         [raid5syncd]
24442 DN    0.0 a2a048 end               rsync -avHx . /vg00/raid0/topanga/icsl.
24445 DN    0.0 a2a048 end               rsync -avHx . /vg00/raid1/topanga/ee.05
24500 DN    1.1 a2a048 end               rsync -avHx . /vg00/raid0/topanga/icsl.
24501 DN    0.8 24aa8c rwsem_down_failed rsync -avHx . /vg00/raid1/topanga/ee.05
26064 D     0.0 a2a048 end               mkdir topanga.030408
26110 D     0.0 107c4a down              ls
26116 D     0.0 a2a048 end               sync
26672 S     0.0 14af8d do_select         in.rlogind
26673 S     0.0 11da20 wait4             login -- root
26674 S     0.0 177e9d read_chan         -bash
26773 D     0.0 15011b wait_on_inode     lilo
26840 D     0.0 15011b wait_on_inode     lilo
26857 S     0.0 14af8d do_select         in.rlogind
26858 S     0.0 11da20 wait4             login -- root
26859 S     0.0 11da20 wait4             -bash


-----
Stephen C. Woods; UCLA SEASnet; 2567 Boelter hall; LA CA 90095; (310)-825-8614
Unless otherwise noted these statements are my own, Not those of the 
University of California.                      Internet mail:scw@seas.ucla.edu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to force raid to stop, cleanly? More info
  2003-04-08 22:29 How to force raid to stop, cleanly? More info Stephen C Woods
@ 2003-04-09  1:11 ` Neil Brown
  2003-04-09  8:01   ` Luca Berra
  2003-04-09 18:22   ` Stephen C Woods
  0 siblings, 2 replies; 6+ messages in thread
From: Neil Brown @ 2003-04-09  1:11 UTC (permalink / raw)
  To: Stephen C Woods; +Cc: Linux-raid maillist

On Tuesday April 8, scw@seas.ucla.edu wrote:
> 
> >   We're running 2.4.20,  we've come across a situation where the 
> >raid and overlying LVM seems to be stuck.  No I/O is occuring, processes
> >trying to access to raid or overlying volumes hang and can't be
> >terminated.
> >
> >   What we'd like to do is to force the raid volume to terminate
> >and sync its disks and superblocks, so that the raid volume is
> >or thinks it is consistant.
> >
> > The overlying JFS will clean up when it restarts, but we don't want to
> >wait 3 days for the RAID to finish recomputing parity (these are some really
> >large arrays).
> 
>   Here's a little more info. 
> (1)The scsi disks are responding correctly,  I can read from each drive.  
> (2) More processes are getting stuck, lilo, sync and mkdir
> (3) There are 2 processes stuck on rwsem_down_failed both related to
>     the raid/lvm.
> (3) mdadm thinks the everything is just fine.

Looks like a jfs bug of some sort.
I suspect your best bet is to 
   reboot -f -n
or
   alt-sysrq-U   # unmount
   alt-sysrq-S   # sync
   alt-sysrq-B   # boot

and hope that works.  It should write out the raid superblcoks, but if
it doesn't nothing else will.

NeilBrown

> 
> Current ps output.
>   PID STAT %CPU  WCHAN WIDE-WCHAN-COLUMN COMMAND,wchan=WIDE-WCHAN-COLUMN -o arg 
>     1 S     0.0 14af8d do_select         init [3] 
>     2 SW    0.0 127efd context_thread    [keventd]
>     3 SW    0.0 11566a apm_mainloop      [kapmd]
>     4 SWN   0.0 11f30e ksoftirqd         [ksoftirqd_CPU0]
>     5 SW    0.1 133806 kswapd            [kswapd]
>     6 SW    0.0 13fe2a bdflush           [bdflush]
>     7 DW    0.1 a2a048 end               [kupdated]
>     8 SW<   0.0 1ebfb5 md_thread         [mdrecoveryd]
>    12 SW    0.0 81cb6c end               [kjournald]
>    68 SW    0.0 83d52e end               [khubd]
>   168 SW    0.0 81cb6c end               [kjournald]
>   169 SW    0.0 81cb6c end               [kjournald]
>   170 SW    0.0 81cb6c end               [kjournald]
>   171 SW    0.0 81cb6c end               [kjournald]
>   465 S     0.0 14af8d do_select         syslogd -m 0
>   469 S     0.0 11a3d1 do_syslog         klogd -x
>   486 S     0.0 14b716 do_poll           portmap
>   505 S     0.0 14af8d do_select         rpc.statd
>   586 S     0.0 14af8d do_select         /usr/sbin/apmd -p 10 -w 5 -W -P /etc/sy
>   602 S     0.0 14b716 do_poll           ypserv
>   614 S     0.0 14b716 do_poll           ypbind
>   661 S     0.0 14af8d do_select         /usr/sbin/sshd
>   675 S     0.0 14af8d do_select         xinetd -stayalive -reuse -pidfile /var/
>   686 SL    0.0 14af8d do_select         ntpd -U ntp -g
>   705 S     0.0 14b716 do_poll           rpc.rquotad
>   709 SW    0.0 95bf32 end               [nfsd]
>   710 SW    0.0 95bf32 end               [nfsd]
>   711 SW    0.0 95bf32 end               [nfsd]
>   712 SW    0.0 95bf32 end               [lockd]
>   713 SW    0.0 95845e end               [rpciod]
>   714 SW    0.0 95bf32 end               [nfsd]
>   715 SW    0.0 95bf32 end               [nfsd]
>   716 SW    0.0 95bf32 end               [nfsd]
>   717 SW    0.0 95bf32 end               [nfsd]
>   718 SW    0.0 95bf32 end               [nfsd]
>   724 S     0.0 14b716 do_poll           rpc.mountd
>   733 S     0.0 14af8d do_select         gpm -t ps/2 -m /dev/mouse
>   742 S     0.0 1231e3 nanosleep         crond
>   771 S     0.0 14af8d do_select         xfs -droppriv -daemon
>   789 S     0.0 1231e3 nanosleep         /usr/sbin/atd
>   802 S     0.0 1231e3 nanosleep         rhnsd --interval 120
>   806 S     0.0 177e9d read_chan         /sbin/mingetty tty1
>   807 S     0.0 177e9d read_chan         /sbin/mingetty tty2
>   808 S     0.0 177e9d read_chan         /sbin/mingetty tty3
>   809 S     0.0 177e9d read_chan         /sbin/mingetty tty4
>   810 S     0.0 177e9d read_chan         /sbin/mingetty tty5
>   811 S     0.0 177e9d read_chan         /sbin/mingetty tty6
>   856 SW<   1.5 1ebfb5 md_thread         [raid5d]
>   857 SWN   0.1 1ebfb5 md_thread         [raid5syncd]
>   882 SW    0.0 a297cb end               [jfsIO]
>   883 DW    0.0 24aa8c rwsem_down_failed [jfsCommit]
>   884 SW    0.0 a2d7cd end               [jfsSync]
>  1669 SW<   2.2 1ebfb5 md_thread         [raid5d]
>  1670 SWN   0.5 1ebfb5 md_thread         [raid5syncd]
> 24442 DN    0.0 a2a048 end               rsync -avHx . /vg00/raid0/topanga/icsl.
> 24445 DN    0.0 a2a048 end               rsync -avHx . /vg00/raid1/topanga/ee.05
> 24500 DN    1.1 a2a048 end               rsync -avHx . /vg00/raid0/topanga/icsl.
> 24501 DN    0.8 24aa8c rwsem_down_failed rsync -avHx . /vg00/raid1/topanga/ee.05
> 26064 D     0.0 a2a048 end               mkdir topanga.030408
> 26110 D     0.0 107c4a down              ls
> 26116 D     0.0 a2a048 end               sync
> 26672 S     0.0 14af8d do_select         in.rlogind
> 26673 S     0.0 11da20 wait4             login -- root
> 26674 S     0.0 177e9d read_chan         -bash
> 26773 D     0.0 15011b wait_on_inode     lilo
> 26840 D     0.0 15011b wait_on_inode     lilo
> 26857 S     0.0 14af8d do_select         in.rlogind
> 26858 S     0.0 11da20 wait4             login -- root
> 26859 S     0.0 11da20 wait4             -bash
> 
> 
> -----
> Stephen C. Woods; UCLA SEASnet; 2567 Boelter hall; LA CA 90095; (310)-825-8614
> Unless otherwise noted these statements are my own, Not those of the 
> University of California.                      Internet mail:scw@seas.ucla.edu
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to force raid to stop, cleanly? More info
  2003-04-09  1:11 ` Neil Brown
@ 2003-04-09  8:01   ` Luca Berra
  2003-04-09 18:22   ` Stephen C Woods
  1 sibling, 0 replies; 6+ messages in thread
From: Luca Berra @ 2003-04-09  8:01 UTC (permalink / raw)
  To: Linux-raid maillist

On Wed, Apr 09, 2003 at 11:11:20AM +1000, Neil Brown wrote:
>   alt-sysrq-U   # unmount
>   alt-sysrq-S   # sync
>   alt-sysrq-B   # boot
btw, some time ago there was a patch that added switch-md-to-readonly
functionality to alt-sysrq. does anyone know where did that patch go?

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to force raid to stop, cleanly? More info
  2003-04-09  1:11 ` Neil Brown
  2003-04-09  8:01   ` Luca Berra
@ 2003-04-09 18:22   ` Stephen C Woods
  2003-04-09 18:33     ` Kevin P. Fleming
  2003-04-09 23:12     ` Neil Brown
  1 sibling, 2 replies; 6+ messages in thread
From: Stephen C Woods @ 2003-04-09 18:22 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux-raid maillist

Neil,
   Thanks a lot, it raid did shutdown cleanly and restarted just fine,
looks like JFS isn't quite mature enough yet.

   With that in mind fiddling with fdisk implies that one can
subpartition a raid device, the FAQ doesn't seem to have any info, so
before we go and tear down what we already have.

   Does the md driver support partitions in a meaningful/useful way?

-----
Stephen C. Woods; UCLA SEASnet; 2567 Boelter hall; LA CA 90095; (310)-825-8614
Unless otherwise noted these statements are my own, Not those of the 
University of California.                      Internet mail:scw@seas.ucla.edu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to force raid to stop, cleanly? More info
  2003-04-09 18:22   ` Stephen C Woods
@ 2003-04-09 18:33     ` Kevin P. Fleming
  2003-04-09 23:12     ` Neil Brown
  1 sibling, 0 replies; 6+ messages in thread
From: Kevin P. Fleming @ 2003-04-09 18:33 UTC (permalink / raw)
  To: Linux-raid maillist

Stephen C Woods wrote:

>    Does the md driver support partitions in a meaningful/useful way?

No, the md driver (and many other block drivers in Linux) do not support 
partitions. Yes, you can use your tool of choice to put a partition 
table on the device, but it won't be used.

You can use LVM1/LVM2 to break up your md device into usable chunks 
(which is what I do).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to force raid to stop, cleanly? More info
  2003-04-09 18:22   ` Stephen C Woods
  2003-04-09 18:33     ` Kevin P. Fleming
@ 2003-04-09 23:12     ` Neil Brown
  1 sibling, 0 replies; 6+ messages in thread
From: Neil Brown @ 2003-04-09 23:12 UTC (permalink / raw)
  To: Stephen C Woods; +Cc: Linux-raid maillist

On Wednesday April 9, scw@seas.ucla.edu wrote:
> Neil,
>    Thanks a lot, it raid did shutdown cleanly and restarted just fine,
> looks like JFS isn't quite mature enough yet.
> 
>    With that in mind fiddling with fdisk implies that one can
> subpartition a raid device, the FAQ doesn't seem to have any info, so
> before we go and tear down what we already have.
> 
>    Does the md driver support partitions in a meaningful/useful way?

Not without patches.

If you look around
   http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/
you should find a collection of patches.

You particularly need MdPart and MdpMajor but they probably depend on
most of the other Md or Raid patches in the set.
With this, the first 16 md devices can each be partitioned into upto
15 partitions.
You use major device '60' and minor devices:
  0-15 for md0
  15-31 for md1
etc.

  60,0 is much the same as 9,0
  60,1 is the first partition
  etc.

There are no matching patches for 2.5 yet.  I'm waiting to see what
happens when the 32-bit device number stuff settles down.

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-04-09 23:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-08 22:29 How to force raid to stop, cleanly? More info Stephen C Woods
2003-04-09  1:11 ` Neil Brown
2003-04-09  8:01   ` Luca Berra
2003-04-09 18:22   ` Stephen C Woods
2003-04-09 18:33     ` Kevin P. Fleming
2003-04-09 23:12     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).