* How to force raid to stop, cleanly? More info
@ 2003-04-08 22:29 Stephen C Woods
2003-04-09 1:11 ` Neil Brown
0 siblings, 1 reply; 6+ messages in thread
From: Stephen C Woods @ 2003-04-08 22:29 UTC (permalink / raw)
To: Linux-raid maillist
> We're running 2.4.20, we've come across a situation where the
>raid and overlying LVM seems to be stuck. No I/O is occuring, processes
>trying to access to raid or overlying volumes hang and can't be
>terminated.
>
> What we'd like to do is to force the raid volume to terminate
>and sync its disks and superblocks, so that the raid volume is
>or thinks it is consistant.
>
> The overlying JFS will clean up when it restarts, but we don't want to
>wait 3 days for the RAID to finish recomputing parity (these are some really
>large arrays).
Here's a little more info.
(1)The scsi disks are responding correctly, I can read from each drive.
(2) More processes are getting stuck, lilo, sync and mkdir
(3) There are 2 processes stuck on rwsem_down_failed both related to
the raid/lvm.
(3) mdadm thinks the everything is just fine.
Current ps output.
PID STAT %CPU WCHAN WIDE-WCHAN-COLUMN COMMAND,wchan=WIDE-WCHAN-COLUMN -o arg
1 S 0.0 14af8d do_select init [3]
2 SW 0.0 127efd context_thread [keventd]
3 SW 0.0 11566a apm_mainloop [kapmd]
4 SWN 0.0 11f30e ksoftirqd [ksoftirqd_CPU0]
5 SW 0.1 133806 kswapd [kswapd]
6 SW 0.0 13fe2a bdflush [bdflush]
7 DW 0.1 a2a048 end [kupdated]
8 SW< 0.0 1ebfb5 md_thread [mdrecoveryd]
12 SW 0.0 81cb6c end [kjournald]
68 SW 0.0 83d52e end [khubd]
168 SW 0.0 81cb6c end [kjournald]
169 SW 0.0 81cb6c end [kjournald]
170 SW 0.0 81cb6c end [kjournald]
171 SW 0.0 81cb6c end [kjournald]
465 S 0.0 14af8d do_select syslogd -m 0
469 S 0.0 11a3d1 do_syslog klogd -x
486 S 0.0 14b716 do_poll portmap
505 S 0.0 14af8d do_select rpc.statd
586 S 0.0 14af8d do_select /usr/sbin/apmd -p 10 -w 5 -W -P /etc/sy
602 S 0.0 14b716 do_poll ypserv
614 S 0.0 14b716 do_poll ypbind
661 S 0.0 14af8d do_select /usr/sbin/sshd
675 S 0.0 14af8d do_select xinetd -stayalive -reuse -pidfile /var/
686 SL 0.0 14af8d do_select ntpd -U ntp -g
705 S 0.0 14b716 do_poll rpc.rquotad
709 SW 0.0 95bf32 end [nfsd]
710 SW 0.0 95bf32 end [nfsd]
711 SW 0.0 95bf32 end [nfsd]
712 SW 0.0 95bf32 end [lockd]
713 SW 0.0 95845e end [rpciod]
714 SW 0.0 95bf32 end [nfsd]
715 SW 0.0 95bf32 end [nfsd]
716 SW 0.0 95bf32 end [nfsd]
717 SW 0.0 95bf32 end [nfsd]
718 SW 0.0 95bf32 end [nfsd]
724 S 0.0 14b716 do_poll rpc.mountd
733 S 0.0 14af8d do_select gpm -t ps/2 -m /dev/mouse
742 S 0.0 1231e3 nanosleep crond
771 S 0.0 14af8d do_select xfs -droppriv -daemon
789 S 0.0 1231e3 nanosleep /usr/sbin/atd
802 S 0.0 1231e3 nanosleep rhnsd --interval 120
806 S 0.0 177e9d read_chan /sbin/mingetty tty1
807 S 0.0 177e9d read_chan /sbin/mingetty tty2
808 S 0.0 177e9d read_chan /sbin/mingetty tty3
809 S 0.0 177e9d read_chan /sbin/mingetty tty4
810 S 0.0 177e9d read_chan /sbin/mingetty tty5
811 S 0.0 177e9d read_chan /sbin/mingetty tty6
856 SW< 1.5 1ebfb5 md_thread [raid5d]
857 SWN 0.1 1ebfb5 md_thread [raid5syncd]
882 SW 0.0 a297cb end [jfsIO]
883 DW 0.0 24aa8c rwsem_down_failed [jfsCommit]
884 SW 0.0 a2d7cd end [jfsSync]
1669 SW< 2.2 1ebfb5 md_thread [raid5d]
1670 SWN 0.5 1ebfb5 md_thread [raid5syncd]
24442 DN 0.0 a2a048 end rsync -avHx . /vg00/raid0/topanga/icsl.
24445 DN 0.0 a2a048 end rsync -avHx . /vg00/raid1/topanga/ee.05
24500 DN 1.1 a2a048 end rsync -avHx . /vg00/raid0/topanga/icsl.
24501 DN 0.8 24aa8c rwsem_down_failed rsync -avHx . /vg00/raid1/topanga/ee.05
26064 D 0.0 a2a048 end mkdir topanga.030408
26110 D 0.0 107c4a down ls
26116 D 0.0 a2a048 end sync
26672 S 0.0 14af8d do_select in.rlogind
26673 S 0.0 11da20 wait4 login -- root
26674 S 0.0 177e9d read_chan -bash
26773 D 0.0 15011b wait_on_inode lilo
26840 D 0.0 15011b wait_on_inode lilo
26857 S 0.0 14af8d do_select in.rlogind
26858 S 0.0 11da20 wait4 login -- root
26859 S 0.0 11da20 wait4 -bash
-----
Stephen C. Woods; UCLA SEASnet; 2567 Boelter hall; LA CA 90095; (310)-825-8614
Unless otherwise noted these statements are my own, Not those of the
University of California. Internet mail:scw@seas.ucla.edu
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: How to force raid to stop, cleanly? More info
2003-04-08 22:29 How to force raid to stop, cleanly? More info Stephen C Woods
@ 2003-04-09 1:11 ` Neil Brown
2003-04-09 8:01 ` Luca Berra
2003-04-09 18:22 ` Stephen C Woods
0 siblings, 2 replies; 6+ messages in thread
From: Neil Brown @ 2003-04-09 1:11 UTC (permalink / raw)
To: Stephen C Woods; +Cc: Linux-raid maillist
On Tuesday April 8, scw@seas.ucla.edu wrote:
>
> > We're running 2.4.20, we've come across a situation where the
> >raid and overlying LVM seems to be stuck. No I/O is occuring, processes
> >trying to access to raid or overlying volumes hang and can't be
> >terminated.
> >
> > What we'd like to do is to force the raid volume to terminate
> >and sync its disks and superblocks, so that the raid volume is
> >or thinks it is consistant.
> >
> > The overlying JFS will clean up when it restarts, but we don't want to
> >wait 3 days for the RAID to finish recomputing parity (these are some really
> >large arrays).
>
> Here's a little more info.
> (1)The scsi disks are responding correctly, I can read from each drive.
> (2) More processes are getting stuck, lilo, sync and mkdir
> (3) There are 2 processes stuck on rwsem_down_failed both related to
> the raid/lvm.
> (3) mdadm thinks the everything is just fine.
Looks like a jfs bug of some sort.
I suspect your best bet is to
reboot -f -n
or
alt-sysrq-U # unmount
alt-sysrq-S # sync
alt-sysrq-B # boot
and hope that works. It should write out the raid superblcoks, but if
it doesn't nothing else will.
NeilBrown
>
> Current ps output.
> PID STAT %CPU WCHAN WIDE-WCHAN-COLUMN COMMAND,wchan=WIDE-WCHAN-COLUMN -o arg
> 1 S 0.0 14af8d do_select init [3]
> 2 SW 0.0 127efd context_thread [keventd]
> 3 SW 0.0 11566a apm_mainloop [kapmd]
> 4 SWN 0.0 11f30e ksoftirqd [ksoftirqd_CPU0]
> 5 SW 0.1 133806 kswapd [kswapd]
> 6 SW 0.0 13fe2a bdflush [bdflush]
> 7 DW 0.1 a2a048 end [kupdated]
> 8 SW< 0.0 1ebfb5 md_thread [mdrecoveryd]
> 12 SW 0.0 81cb6c end [kjournald]
> 68 SW 0.0 83d52e end [khubd]
> 168 SW 0.0 81cb6c end [kjournald]
> 169 SW 0.0 81cb6c end [kjournald]
> 170 SW 0.0 81cb6c end [kjournald]
> 171 SW 0.0 81cb6c end [kjournald]
> 465 S 0.0 14af8d do_select syslogd -m 0
> 469 S 0.0 11a3d1 do_syslog klogd -x
> 486 S 0.0 14b716 do_poll portmap
> 505 S 0.0 14af8d do_select rpc.statd
> 586 S 0.0 14af8d do_select /usr/sbin/apmd -p 10 -w 5 -W -P /etc/sy
> 602 S 0.0 14b716 do_poll ypserv
> 614 S 0.0 14b716 do_poll ypbind
> 661 S 0.0 14af8d do_select /usr/sbin/sshd
> 675 S 0.0 14af8d do_select xinetd -stayalive -reuse -pidfile /var/
> 686 SL 0.0 14af8d do_select ntpd -U ntp -g
> 705 S 0.0 14b716 do_poll rpc.rquotad
> 709 SW 0.0 95bf32 end [nfsd]
> 710 SW 0.0 95bf32 end [nfsd]
> 711 SW 0.0 95bf32 end [nfsd]
> 712 SW 0.0 95bf32 end [lockd]
> 713 SW 0.0 95845e end [rpciod]
> 714 SW 0.0 95bf32 end [nfsd]
> 715 SW 0.0 95bf32 end [nfsd]
> 716 SW 0.0 95bf32 end [nfsd]
> 717 SW 0.0 95bf32 end [nfsd]
> 718 SW 0.0 95bf32 end [nfsd]
> 724 S 0.0 14b716 do_poll rpc.mountd
> 733 S 0.0 14af8d do_select gpm -t ps/2 -m /dev/mouse
> 742 S 0.0 1231e3 nanosleep crond
> 771 S 0.0 14af8d do_select xfs -droppriv -daemon
> 789 S 0.0 1231e3 nanosleep /usr/sbin/atd
> 802 S 0.0 1231e3 nanosleep rhnsd --interval 120
> 806 S 0.0 177e9d read_chan /sbin/mingetty tty1
> 807 S 0.0 177e9d read_chan /sbin/mingetty tty2
> 808 S 0.0 177e9d read_chan /sbin/mingetty tty3
> 809 S 0.0 177e9d read_chan /sbin/mingetty tty4
> 810 S 0.0 177e9d read_chan /sbin/mingetty tty5
> 811 S 0.0 177e9d read_chan /sbin/mingetty tty6
> 856 SW< 1.5 1ebfb5 md_thread [raid5d]
> 857 SWN 0.1 1ebfb5 md_thread [raid5syncd]
> 882 SW 0.0 a297cb end [jfsIO]
> 883 DW 0.0 24aa8c rwsem_down_failed [jfsCommit]
> 884 SW 0.0 a2d7cd end [jfsSync]
> 1669 SW< 2.2 1ebfb5 md_thread [raid5d]
> 1670 SWN 0.5 1ebfb5 md_thread [raid5syncd]
> 24442 DN 0.0 a2a048 end rsync -avHx . /vg00/raid0/topanga/icsl.
> 24445 DN 0.0 a2a048 end rsync -avHx . /vg00/raid1/topanga/ee.05
> 24500 DN 1.1 a2a048 end rsync -avHx . /vg00/raid0/topanga/icsl.
> 24501 DN 0.8 24aa8c rwsem_down_failed rsync -avHx . /vg00/raid1/topanga/ee.05
> 26064 D 0.0 a2a048 end mkdir topanga.030408
> 26110 D 0.0 107c4a down ls
> 26116 D 0.0 a2a048 end sync
> 26672 S 0.0 14af8d do_select in.rlogind
> 26673 S 0.0 11da20 wait4 login -- root
> 26674 S 0.0 177e9d read_chan -bash
> 26773 D 0.0 15011b wait_on_inode lilo
> 26840 D 0.0 15011b wait_on_inode lilo
> 26857 S 0.0 14af8d do_select in.rlogind
> 26858 S 0.0 11da20 wait4 login -- root
> 26859 S 0.0 11da20 wait4 -bash
>
>
> -----
> Stephen C. Woods; UCLA SEASnet; 2567 Boelter hall; LA CA 90095; (310)-825-8614
> Unless otherwise noted these statements are my own, Not those of the
> University of California. Internet mail:scw@seas.ucla.edu
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to force raid to stop, cleanly? More info
2003-04-09 1:11 ` Neil Brown
@ 2003-04-09 8:01 ` Luca Berra
2003-04-09 18:22 ` Stephen C Woods
1 sibling, 0 replies; 6+ messages in thread
From: Luca Berra @ 2003-04-09 8:01 UTC (permalink / raw)
To: Linux-raid maillist
On Wed, Apr 09, 2003 at 11:11:20AM +1000, Neil Brown wrote:
> alt-sysrq-U # unmount
> alt-sysrq-S # sync
> alt-sysrq-B # boot
btw, some time ago there was a patch that added switch-md-to-readonly
functionality to alt-sysrq. does anyone know where did that patch go?
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: How to force raid to stop, cleanly? More info
2003-04-09 1:11 ` Neil Brown
2003-04-09 8:01 ` Luca Berra
@ 2003-04-09 18:22 ` Stephen C Woods
2003-04-09 18:33 ` Kevin P. Fleming
2003-04-09 23:12 ` Neil Brown
1 sibling, 2 replies; 6+ messages in thread
From: Stephen C Woods @ 2003-04-09 18:22 UTC (permalink / raw)
To: Neil Brown; +Cc: Linux-raid maillist
Neil,
Thanks a lot, it raid did shutdown cleanly and restarted just fine,
looks like JFS isn't quite mature enough yet.
With that in mind fiddling with fdisk implies that one can
subpartition a raid device, the FAQ doesn't seem to have any info, so
before we go and tear down what we already have.
Does the md driver support partitions in a meaningful/useful way?
-----
Stephen C. Woods; UCLA SEASnet; 2567 Boelter hall; LA CA 90095; (310)-825-8614
Unless otherwise noted these statements are my own, Not those of the
University of California. Internet mail:scw@seas.ucla.edu
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to force raid to stop, cleanly? More info
2003-04-09 18:22 ` Stephen C Woods
@ 2003-04-09 18:33 ` Kevin P. Fleming
2003-04-09 23:12 ` Neil Brown
1 sibling, 0 replies; 6+ messages in thread
From: Kevin P. Fleming @ 2003-04-09 18:33 UTC (permalink / raw)
To: Linux-raid maillist
Stephen C Woods wrote:
> Does the md driver support partitions in a meaningful/useful way?
No, the md driver (and many other block drivers in Linux) do not support
partitions. Yes, you can use your tool of choice to put a partition
table on the device, but it won't be used.
You can use LVM1/LVM2 to break up your md device into usable chunks
(which is what I do).
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: How to force raid to stop, cleanly? More info
2003-04-09 18:22 ` Stephen C Woods
2003-04-09 18:33 ` Kevin P. Fleming
@ 2003-04-09 23:12 ` Neil Brown
1 sibling, 0 replies; 6+ messages in thread
From: Neil Brown @ 2003-04-09 23:12 UTC (permalink / raw)
To: Stephen C Woods; +Cc: Linux-raid maillist
On Wednesday April 9, scw@seas.ucla.edu wrote:
> Neil,
> Thanks a lot, it raid did shutdown cleanly and restarted just fine,
> looks like JFS isn't quite mature enough yet.
>
> With that in mind fiddling with fdisk implies that one can
> subpartition a raid device, the FAQ doesn't seem to have any info, so
> before we go and tear down what we already have.
>
> Does the md driver support partitions in a meaningful/useful way?
Not without patches.
If you look around
http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/
you should find a collection of patches.
You particularly need MdPart and MdpMajor but they probably depend on
most of the other Md or Raid patches in the set.
With this, the first 16 md devices can each be partitioned into upto
15 partitions.
You use major device '60' and minor devices:
0-15 for md0
15-31 for md1
etc.
60,0 is much the same as 9,0
60,1 is the first partition
etc.
There are no matching patches for 2.5 yet. I'm waiting to see what
happens when the 32-bit device number stuff settles down.
NeilBrown
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-04-09 23:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-08 22:29 How to force raid to stop, cleanly? More info Stephen C Woods
2003-04-09 1:11 ` Neil Brown
2003-04-09 8:01 ` Luca Berra
2003-04-09 18:22 ` Stephen C Woods
2003-04-09 18:33 ` Kevin P. Fleming
2003-04-09 23:12 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).