Really fucked up raid0 array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Really fucked up raid0 array
@ 2004-07-05 11:39 Geir Råness
  2004-07-05 14:58 ` maarten van den Berg
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Geir Råness @ 2004-07-05 11:39 UTC (permalink / raw)
  To: linux-raid

Hi

When i woke up this morning i noticed my ps aux shuddenly stop during 
listning proccess, and refuses to abort or complete it self.
So i started to worry that something is wrong, i noticed in dmesg that 
one of my sw raid arrays had got errors during the night.

hdn: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdn: dma_intr: error=0x40 { UncorrectableError }, LBAsect=51001032, 
high=3, low=669384, sector=51001032
end_request: I/O error, dev 58:40 (hdn), sector 51001032

So i try to unmount the device with the unmount -f /some/fuckedArray, 
but it refuses to let me unmount it.

So is there anyway to force shutdown the raid array ?
since raidstop doesn't let me shutdown since the array is "active".

Best Regards
Geir Råness
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Really fucked up raid0 array
  2004-07-05 11:39 Really fucked up raid0 array Geir Råness
@ 2004-07-05 14:58 ` maarten van den Berg
  2004-07-05 15:02   ` Geir Råness
       [not found]   ` <40E96CA5.9030500@pulz.no>
  2004-07-05 16:11 ` TJ Harrell
  2004-07-05 18:06 ` Really messed " Guy
  2 siblings, 2 replies; 9+ messages in thread
From: maarten van den Berg @ 2004-07-05 14:58 UTC (permalink / raw)
  To: linux-raid

On Monday 05 July 2004 13:39, Geir Råness wrote:
> Hi
>
> When i woke up this morning i noticed my ps aux shuddenly stop during
> listning proccess, and refuses to abort or complete it self.
> So i started to worry that something is wrong, i noticed in dmesg that
> one of my sw raid arrays had got errors during the night.
>
> hdn: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hdn: dma_intr: error=0x40 { UncorrectableError }, LBAsect=51001032,
> high=3, low=669384, sector=51001032
> end_request: I/O error, dev 58:40 (hdn), sector 51001032
>
> So i try to unmount the device with the unmount -f /some/fuckedArray,
> but it refuses to let me unmount it.

Maybe you cannot umount it because it's still in use ?  In that case, run 
'lsof | grep <mountpoint>' to see what resources use files on that mountpoint, 
and terminate these processes first.

Maarten

-- 
When I answered where I wanted to go today, they just hung up -- Unknown

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Really fucked up raid0 array
  2004-07-05 14:58 ` maarten van den Berg
@ 2004-07-05 15:02   ` Geir Råness
       [not found]   ` <40E96CA5.9030500@pulz.no>
  1 sibling, 0 replies; 9+ messages in thread
From: Geir Råness @ 2004-07-05 15:02 UTC (permalink / raw)
  To: linux-raid

maarten van den Berg wrote:

>On Monday 05 July 2004 13:39, Geir Råness wrote:
>  
>
>>Hi
>>
>>When i woke up this morning i noticed my ps aux shuddenly stop during
>>listning proccess, and refuses to abort or complete it self.
>>So i started to worry that something is wrong, i noticed in dmesg that
>>one of my sw raid arrays had got errors during the night.
>>
>>hdn: dma_intr: status=0x51 { DriveReady SeekComplete Error }
>>hdn: dma_intr: error=0x40 { UncorrectableError }, LBAsect=51001032,
>>high=3, low=669384, sector=51001032
>>end_request: I/O error, dev 58:40 (hdn), sector 51001032
>>
>>So i try to unmount the device with the unmount -f /some/fuckedArray,
>>but it refuses to let me unmount it.
>>    
>>
>
>Maybe you cannot umount it because it's still in use ?  In that case, run 
>'lsof | grep <mountpoint>' to see what resources use files on that mountpoint, 
>and terminate these processes first.
>
>Maarten
>
>  
>
Way ahead of you.
lsof freeses to, so i arn't able to find out what is using the disk.

All programs like:
ps
w
finger
who
lsof
ls

and stuff like that freeses

I also tried killing prorgrams that might be in the danger sone of using 
the disk, killall -9 blahblah
and it freeses :)


Best Regards
Geir Råness
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <40E96CA5.9030500@pulz.no>]

* Re: Really fucked up raid0 array
       [not found]   ` <40E96CA5.9030500@pulz.no>
@ 2004-07-05 18:29     ` maarten van den Berg
  2004-07-06  7:22       ` Mark Overmeer
  0 siblings, 1 reply; 9+ messages in thread
From: maarten van den Berg @ 2004-07-05 18:29 UTC (permalink / raw)
  To: Geir Råness, linux-raid

On Monday 05 July 2004 16:58, you wrote:
> maarten van den Berg wrote:
> >On Monday 05 July 2004 13:39, Geir Råness wrote:

> >Maybe you cannot umount it because it's still in use ?  In that case, run
> >'lsof | grep <mountpoint>' to see what resources use files on that
> > mountpoint, and terminate these processes first.

> Way ahead of you.
> lsof freeses to, so i arn't able to find out what is using the disk.
>
> All programs like:
> ps
> w
> finger
> who
> lsof
> ls
>
> and stuff like that freeses

Hm...  Sorry to hear that.  I've had those same effects happen to me 
sometimes, and I always wondered if there isn't a way to get out of the 
predicament...  But I have not found any other way than reset, yet.

I don't remember when this happened exactly, but one surefire way I know to 
trigger it is doing a 'df' when you've mounted an NFS share (without "intr") 
and you lost connectivity to that share. Not only the process hangs, but any 
attempt to subsequentially kill that process hangs, too. Wash, rinse...  :-(  

My guess is, this is a kernel in severe state of panic. Maybe swap was on that 
missing array(?), but a host of other reasons could've led to this too.

> I also tried killing prorgrams that might be in the danger sone of using
> the disk, killall -9 blahblah
> and it freeses :)

I now the feeling.  At this point, the only thing I can suggest is either try 
to salvage things with SYSREQ keys [if enabled], or else run shutdown and by 
the time that shutdown hangs too (since it probably will hang, in my 
experience), press reset when disk activity (seems to-) have stopped.

But maybe more enlightened people here have better suggestions...? 

Maarten

-- 
When I answered where I wanted to go today, they just hung up -- Unknown

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Really fucked up raid0 array
  2004-07-05 18:29     ` maarten van den Berg
@ 2004-07-06  7:22       ` Mark Overmeer
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Overmeer @ 2004-07-06  7:22 UTC (permalink / raw)
  To: maarten van den Berg; +Cc: linux-raid

* maarten van den Berg (maarten@ultratux.net) [040705 20:22]:
> > >Maybe you cannot umount it because it's still in use ?  In that case, run
> > >'lsof | grep <mountpoint>' to see what resources use files on that
> > > mountpoint, and terminate these processes first.
> 
> > Way ahead of you.
> > lsof freeses to, so i arn't able to find out what is using the disk.
> >
> > All programs like:
> > ps
> > w
> > finger
> > who
> > lsof
> > ls
> >
> > and stuff like that freeses

What I recall from a little investigation way-back, when a hanging NFS
frooze the system all the time, this has to do with your mount-point...

Most program (I do not understand why, but really nearly all systems)
call getcwd(), to get their current working directory.  getcwd() is
quite silly... it does  cd ..; cd ..; cd .. until it arrives at /
and then it descends back into the tree based in the inodes of the
directories it encountered... this way, the absolute path (without
symbolic links) of the command is found.

Well, a problem appears when jumping up (with cd ..) over a mount point,
because the root inode of each file-system has number 2.  In that case,
descending back to figure out the path, the directory which contains the
i-node will need to be scanned in detail.  Each mount-point in that
directory will be asked for the device number.

Asking for a device number of a stale-NFS or RAID-set in an illegal state
may have different effects.  It is simply an implementation issue in the
driver.  In some cases it then blocks (waiting for NFS or RAID to come
up) and sometimes results in an error.  Both have their own advantage:
for instance, a network connection may be lost for a few seconds, and
you do not want all programs crash immediately because their NFS data
is lost.  But when a remote NFS server is down for a long time, you may
want to get an error as fast as possible.  Or at least you like to be
able to interrupt the waiting process (in traditional UNIX systems,
you cannot interrupt processes which are waiting in the kernel. NFS is
in the kernel, so you cannot interrupt processes waiting for a response
of the NFS server on those systems: stale NFS)

So, it is really smart (as general rule-of-thumb for the average UNIX
system) to have the mount-points of sub-systems/network systems away
from the tree with normal commands.  So: do not mount in /, but for
instance in /mnt.  Then, if you need to, simplify the path for the users
by creating symlinks.

    /home -> /mnt/home
    /mnt/home is RAID array mount-point

Ok, long story, which may or may not have any relation to your
problem.  However, the behavior you report is very charateric for
this problem.
-- 
               MarkOv

------------------------------------------------------------------------
drs Mark A.C.J. Overmeer                                MARKOV Solutions
       Mark@Overmeer.net                          solutions@overmeer.net
http://Mark.Overmeer.net                   http://solutions.overmeer.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Really fucked up raid0 array
  2004-07-05 11:39 Really fucked up raid0 array Geir Råness
  2004-07-05 14:58 ` maarten van den Berg
@ 2004-07-05 16:11 ` TJ Harrell
  2004-07-05 18:06 ` Really messed " Guy
  2 siblings, 0 replies; 9+ messages in thread
From: TJ Harrell @ 2004-07-05 16:11 UTC (permalink / raw)
  To: linux-raid

This is a long shot, but it's bitten me before.. If you log in, and then su
as root while your cwd is on the raid array, that bash shell will keep the
raid array busy. It backgrounds and hangs around while your su'd. Also,
mount has an option to remount read-only for cases where you can't unmount.
Not really sure where you could go from there, though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Really messed up raid0 array
  2004-07-05 11:39 Really fucked up raid0 array Geir Råness
  2004-07-05 14:58 ` maarten van den Berg
  2004-07-05 16:11 ` TJ Harrell
@ 2004-07-05 18:06 ` Guy
  2004-07-05 19:06   ` Gregory Leblanc
  2 siblings, 1 reply; 9+ messages in thread
From: Guy @ 2004-07-05 18:06 UTC (permalink / raw)
  To: linux-raid

I have some ideas...oops you said the f word, bye.

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Geir Råness
Sent: Monday, July 05, 2004 7:40 AM
To: linux-raid@vger.kernel.org
Subject: Really messed up raid0 array

Hi

When i woke up this morning i noticed my ps aux shuddenly stop during 
listning proccess, and refuses to abort or complete it self.
So i started to worry that something is wrong, i noticed in dmesg that 
one of my sw raid arrays had got errors during the night.

hdn: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdn: dma_intr: error=0x40 { UncorrectableError }, LBAsect=51001032, 
high=3, low=669384, sector=51001032
end_request: I/O error, dev 58:40 (hdn), sector 51001032

So i try to unmount the device with the unmount -f /some/messedArray, 
but it refuses to let me unmount it.

So is there anyway to force shutdown the raid array ?
since raidstop doesn't let me shutdown since the array is "active".

Best Regards
Geir Råness
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Really messed up raid0 array
  2004-07-05 18:06 ` Really messed " Guy
@ 2004-07-05 19:06   ` Gregory Leblanc
  2004-07-05 21:01     ` Guy
  0 siblings, 1 reply; 9+ messages in thread
From: Gregory Leblanc @ 2004-07-05 19:06 UTC (permalink / raw)
  Cc: linux-raid

On Mon, 2004-07-05 at 11:06, Guy wrote:
> I have some ideas...oops you said the f word, bye.

If you're not going to contribute, please don't post at all.  Thanks,
	Greg

[snip]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Really messed up raid0 array
  2004-07-05 19:06   ` Gregory Leblanc
@ 2004-07-05 21:01     ` Guy
  0 siblings, 0 replies; 9+ messages in thread
From: Guy @ 2004-07-05 21:01 UTC (permalink / raw)
  Cc: linux-raid

Does your message contribute?
I guess you are a "do as I say, not as I do" person!

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Gregory Leblanc
Sent: Monday, July 05, 2004 3:06 PM
To: unlisted-recipients:; no To-header on input
Cc: linux-raid@vger.kernel.org
Subject: RE: Really messed up raid0 array

On Mon, 2004-07-05 at 11:06, Guy wrote:
> I have some ideas...oops you said the f word, bye.

If you're not going to contribute, please don't post at all.  Thanks,
	Greg

[snip]

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-07-06  7:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-05 11:39 Really fucked up raid0 array Geir Råness
2004-07-05 14:58 ` maarten van den Berg
2004-07-05 15:02   ` Geir Råness
     [not found]   ` <40E96CA5.9030500@pulz.no>
2004-07-05 18:29     ` maarten van den Berg
2004-07-06  7:22       ` Mark Overmeer
2004-07-05 16:11 ` TJ Harrell
2004-07-05 18:06 ` Really messed " Guy
2004-07-05 19:06   ` Gregory Leblanc
2004-07-05 21:01     ` Guy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).