* Really fucked up raid0 array
@ 2004-07-05 11:39 Geir Råness
2004-07-05 14:58 ` maarten van den Berg
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Geir Råness @ 2004-07-05 11:39 UTC (permalink / raw)
To: linux-raid
Hi
When i woke up this morning i noticed my ps aux shuddenly stop during
listning proccess, and refuses to abort or complete it self.
So i started to worry that something is wrong, i noticed in dmesg that
one of my sw raid arrays had got errors during the night.
hdn: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdn: dma_intr: error=0x40 { UncorrectableError }, LBAsect=51001032,
high=3, low=669384, sector=51001032
end_request: I/O error, dev 58:40 (hdn), sector 51001032
So i try to unmount the device with the unmount -f /some/fuckedArray,
but it refuses to let me unmount it.
So is there anyway to force shutdown the raid array ?
since raidstop doesn't let me shutdown since the array is "active".
Best Regards
Geir Råness
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Really fucked up raid0 array 2004-07-05 11:39 Really fucked up raid0 array Geir Råness @ 2004-07-05 14:58 ` maarten van den Berg 2004-07-05 15:02 ` Geir Råness [not found] ` <40E96CA5.9030500@pulz.no> 2004-07-05 16:11 ` TJ Harrell 2004-07-05 18:06 ` Really messed " Guy 2 siblings, 2 replies; 9+ messages in thread From: maarten van den Berg @ 2004-07-05 14:58 UTC (permalink / raw) To: linux-raid On Monday 05 July 2004 13:39, Geir Råness wrote: > Hi > > When i woke up this morning i noticed my ps aux shuddenly stop during > listning proccess, and refuses to abort or complete it self. > So i started to worry that something is wrong, i noticed in dmesg that > one of my sw raid arrays had got errors during the night. > > hdn: dma_intr: status=0x51 { DriveReady SeekComplete Error } > hdn: dma_intr: error=0x40 { UncorrectableError }, LBAsect=51001032, > high=3, low=669384, sector=51001032 > end_request: I/O error, dev 58:40 (hdn), sector 51001032 > > So i try to unmount the device with the unmount -f /some/fuckedArray, > but it refuses to let me unmount it. Maybe you cannot umount it because it's still in use ? In that case, run 'lsof | grep <mountpoint>' to see what resources use files on that mountpoint, and terminate these processes first. Maarten -- When I answered where I wanted to go today, they just hung up -- Unknown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Really fucked up raid0 array 2004-07-05 14:58 ` maarten van den Berg @ 2004-07-05 15:02 ` Geir Råness [not found] ` <40E96CA5.9030500@pulz.no> 1 sibling, 0 replies; 9+ messages in thread From: Geir Råness @ 2004-07-05 15:02 UTC (permalink / raw) To: linux-raid maarten van den Berg wrote: >On Monday 05 July 2004 13:39, Geir Råness wrote: > > >>Hi >> >>When i woke up this morning i noticed my ps aux shuddenly stop during >>listning proccess, and refuses to abort or complete it self. >>So i started to worry that something is wrong, i noticed in dmesg that >>one of my sw raid arrays had got errors during the night. >> >>hdn: dma_intr: status=0x51 { DriveReady SeekComplete Error } >>hdn: dma_intr: error=0x40 { UncorrectableError }, LBAsect=51001032, >>high=3, low=669384, sector=51001032 >>end_request: I/O error, dev 58:40 (hdn), sector 51001032 >> >>So i try to unmount the device with the unmount -f /some/fuckedArray, >>but it refuses to let me unmount it. >> >> > >Maybe you cannot umount it because it's still in use ? In that case, run >'lsof | grep <mountpoint>' to see what resources use files on that mountpoint, >and terminate these processes first. > >Maarten > > > Way ahead of you. lsof freeses to, so i arn't able to find out what is using the disk. All programs like: ps w finger who lsof ls and stuff like that freeses I also tried killing prorgrams that might be in the danger sone of using the disk, killall -9 blahblah and it freeses :) Best Regards Geir Råness - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <40E96CA5.9030500@pulz.no>]
* Re: Really fucked up raid0 array [not found] ` <40E96CA5.9030500@pulz.no> @ 2004-07-05 18:29 ` maarten van den Berg 2004-07-06 7:22 ` Mark Overmeer 0 siblings, 1 reply; 9+ messages in thread From: maarten van den Berg @ 2004-07-05 18:29 UTC (permalink / raw) To: Geir Råness, linux-raid On Monday 05 July 2004 16:58, you wrote: > maarten van den Berg wrote: > >On Monday 05 July 2004 13:39, Geir Råness wrote: > >Maybe you cannot umount it because it's still in use ? In that case, run > >'lsof | grep <mountpoint>' to see what resources use files on that > > mountpoint, and terminate these processes first. > Way ahead of you. > lsof freeses to, so i arn't able to find out what is using the disk. > > All programs like: > ps > w > finger > who > lsof > ls > > and stuff like that freeses Hm... Sorry to hear that. I've had those same effects happen to me sometimes, and I always wondered if there isn't a way to get out of the predicament... But I have not found any other way than reset, yet. I don't remember when this happened exactly, but one surefire way I know to trigger it is doing a 'df' when you've mounted an NFS share (without "intr") and you lost connectivity to that share. Not only the process hangs, but any attempt to subsequentially kill that process hangs, too. Wash, rinse... :-( My guess is, this is a kernel in severe state of panic. Maybe swap was on that missing array(?), but a host of other reasons could've led to this too. > I also tried killing prorgrams that might be in the danger sone of using > the disk, killall -9 blahblah > and it freeses :) I now the feeling. At this point, the only thing I can suggest is either try to salvage things with SYSREQ keys [if enabled], or else run shutdown and by the time that shutdown hangs too (since it probably will hang, in my experience), press reset when disk activity (seems to-) have stopped. But maybe more enlightened people here have better suggestions...? Maarten -- When I answered where I wanted to go today, they just hung up -- Unknown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Really fucked up raid0 array 2004-07-05 18:29 ` maarten van den Berg @ 2004-07-06 7:22 ` Mark Overmeer 0 siblings, 0 replies; 9+ messages in thread From: Mark Overmeer @ 2004-07-06 7:22 UTC (permalink / raw) To: maarten van den Berg; +Cc: linux-raid * maarten van den Berg (maarten@ultratux.net) [040705 20:22]: > > >Maybe you cannot umount it because it's still in use ? In that case, run > > >'lsof | grep <mountpoint>' to see what resources use files on that > > > mountpoint, and terminate these processes first. > > > Way ahead of you. > > lsof freeses to, so i arn't able to find out what is using the disk. > > > > All programs like: > > ps > > w > > finger > > who > > lsof > > ls > > > > and stuff like that freeses What I recall from a little investigation way-back, when a hanging NFS frooze the system all the time, this has to do with your mount-point... Most program (I do not understand why, but really nearly all systems) call getcwd(), to get their current working directory. getcwd() is quite silly... it does cd ..; cd ..; cd .. until it arrives at / and then it descends back into the tree based in the inodes of the directories it encountered... this way, the absolute path (without symbolic links) of the command is found. Well, a problem appears when jumping up (with cd ..) over a mount point, because the root inode of each file-system has number 2. In that case, descending back to figure out the path, the directory which contains the i-node will need to be scanned in detail. Each mount-point in that directory will be asked for the device number. Asking for a device number of a stale-NFS or RAID-set in an illegal state may have different effects. It is simply an implementation issue in the driver. In some cases it then blocks (waiting for NFS or RAID to come up) and sometimes results in an error. Both have their own advantage: for instance, a network connection may be lost for a few seconds, and you do not want all programs crash immediately because their NFS data is lost. But when a remote NFS server is down for a long time, you may want to get an error as fast as possible. Or at least you like to be able to interrupt the waiting process (in traditional UNIX systems, you cannot interrupt processes which are waiting in the kernel. NFS is in the kernel, so you cannot interrupt processes waiting for a response of the NFS server on those systems: stale NFS) So, it is really smart (as general rule-of-thumb for the average UNIX system) to have the mount-points of sub-systems/network systems away from the tree with normal commands. So: do not mount in /, but for instance in /mnt. Then, if you need to, simplify the path for the users by creating symlinks. /home -> /mnt/home /mnt/home is RAID array mount-point Ok, long story, which may or may not have any relation to your problem. However, the behavior you report is very charateric for this problem. -- MarkOv ------------------------------------------------------------------------ drs Mark A.C.J. Overmeer MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Really fucked up raid0 array 2004-07-05 11:39 Really fucked up raid0 array Geir Råness 2004-07-05 14:58 ` maarten van den Berg @ 2004-07-05 16:11 ` TJ Harrell 2004-07-05 18:06 ` Really messed " Guy 2 siblings, 0 replies; 9+ messages in thread From: TJ Harrell @ 2004-07-05 16:11 UTC (permalink / raw) To: linux-raid This is a long shot, but it's bitten me before.. If you log in, and then su as root while your cwd is on the raid array, that bash shell will keep the raid array busy. It backgrounds and hangs around while your su'd. Also, mount has an option to remount read-only for cases where you can't unmount. Not really sure where you could go from there, though. ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Really messed up raid0 array 2004-07-05 11:39 Really fucked up raid0 array Geir Råness 2004-07-05 14:58 ` maarten van den Berg 2004-07-05 16:11 ` TJ Harrell @ 2004-07-05 18:06 ` Guy 2004-07-05 19:06 ` Gregory Leblanc 2 siblings, 1 reply; 9+ messages in thread From: Guy @ 2004-07-05 18:06 UTC (permalink / raw) To: linux-raid I have some ideas...oops you said the f word, bye. -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Geir Råness Sent: Monday, July 05, 2004 7:40 AM To: linux-raid@vger.kernel.org Subject: Really messed up raid0 array Hi When i woke up this morning i noticed my ps aux shuddenly stop during listning proccess, and refuses to abort or complete it self. So i started to worry that something is wrong, i noticed in dmesg that one of my sw raid arrays had got errors during the night. hdn: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdn: dma_intr: error=0x40 { UncorrectableError }, LBAsect=51001032, high=3, low=669384, sector=51001032 end_request: I/O error, dev 58:40 (hdn), sector 51001032 So i try to unmount the device with the unmount -f /some/messedArray, but it refuses to let me unmount it. So is there anyway to force shutdown the raid array ? since raidstop doesn't let me shutdown since the array is "active". Best Regards Geir Råness - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Really messed up raid0 array 2004-07-05 18:06 ` Really messed " Guy @ 2004-07-05 19:06 ` Gregory Leblanc 2004-07-05 21:01 ` Guy 0 siblings, 1 reply; 9+ messages in thread From: Gregory Leblanc @ 2004-07-05 19:06 UTC (permalink / raw) Cc: linux-raid On Mon, 2004-07-05 at 11:06, Guy wrote: > I have some ideas...oops you said the f word, bye. If you're not going to contribute, please don't post at all. Thanks, Greg [snip] ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Really messed up raid0 array 2004-07-05 19:06 ` Gregory Leblanc @ 2004-07-05 21:01 ` Guy 0 siblings, 0 replies; 9+ messages in thread From: Guy @ 2004-07-05 21:01 UTC (permalink / raw) Cc: linux-raid Does your message contribute? I guess you are a "do as I say, not as I do" person! -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Gregory Leblanc Sent: Monday, July 05, 2004 3:06 PM To: unlisted-recipients:; no To-header on input Cc: linux-raid@vger.kernel.org Subject: RE: Really messed up raid0 array On Mon, 2004-07-05 at 11:06, Guy wrote: > I have some ideas...oops you said the f word, bye. If you're not going to contribute, please don't post at all. Thanks, Greg [snip] - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-07-06 7:22 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-05 11:39 Really fucked up raid0 array Geir Råness
2004-07-05 14:58 ` maarten van den Berg
2004-07-05 15:02 ` Geir Råness
[not found] ` <40E96CA5.9030500@pulz.no>
2004-07-05 18:29 ` maarten van den Berg
2004-07-06 7:22 ` Mark Overmeer
2004-07-05 16:11 ` TJ Harrell
2004-07-05 18:06 ` Really messed " Guy
2004-07-05 19:06 ` Gregory Leblanc
2004-07-05 21:01 ` Guy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).