linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question: how to identify failing disk in a RAID1
@ 2008-04-13 19:14 Maurice Hilarius
  2008-04-13 19:29 ` Justin Piszcz
  0 siblings, 1 reply; 16+ messages in thread
From: Maurice Hilarius @ 2008-04-13 19:14 UTC (permalink / raw)
  To: linux-raid

Hi there.

Recently I have been frequently seeing a damaged filesystem on a RAID1 
on boot.
a lengthy fsck does get it working, but I am seeing files disappearing 
as a result.

I am pretty sure that one of the drives has developed some issues and 
needs to be replaced.

How does one identify which of the 2 disks is the one that is failing?

The system has 2 identical disks, and  / is on md0

fstab:
/dev/md0                /                       ext3    defaults        1 1
LABEL=/boot1            /boot                   ext2    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=/boot11           /boot1                  ext2    defaults        1 2
LABEL=SWAP-sdb3         swap                    swap    defaults        0 0
LABEL=SWAP-sda2         swap                    swap    defaults        0 0

fdisk -l shows me:
Disk /dev/sda: 400.0 GB, 400088457216 bytes
255 heads, 63 sectors/track, 48641 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14         535     4192965   82  Linux swap / Solaris
/dev/sda3             536       48641   386411445   fd  Linux raid 
autodetect

Disk /dev/sdb: 400.0 GB, 400088457216 bytes
255 heads, 63 sectors/track, 48641 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   83  Linux
/dev/sdb2              14       48118   386403412+  fd  Linux raid 
autodetect
/dev/sdb3           48119       48640     4192965   82  Linux swap / Solaris

Disk /dev/md0: 395.6 GB, 395677007872 bytes
2 heads, 4 sectors/track, 96600832 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Anyone have a suggestion, please?
Responses off list are probably most appropriate.

Thanks for any help.

-- 
Regards, Maurice
mhilarius@gmail.com




^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Question: how to identify failing disk in a RAID1
@ 2008-04-18 17:36 David Lethe
  0 siblings, 0 replies; 16+ messages in thread
From: David Lethe @ 2008-04-18 17:36 UTC (permalink / raw)
  To: Bill Davidsen, Maurice Hilarius; +Cc: vger majordomo for lists

The sympoms are indicative of a standard bad block reallocation.  Depending on make, model, firmare rev and even location of the new defect it could take several seconds for the disk to grab a spare from the reserved are and fix the defect. No reason for concern ... The system worked like it was desigmed to .

-----Original Message-----

From:  "Bill Davidsen" <davidsen@tmr.com>
Subj:  Re: Question: how to identify failing disk in a RAID1
Date:  Fri Apr 18, 2008 8:15 am
Size:  2K
To:  "Maurice Hilarius" <maurice@harddata.com>
cc:  "vger majordomo for lists" <linux-raid@vger.kernel.org>

Maurice Hilarius wrote: 
> Bill Davidsen wrote: 
>> Maurice Hilarius wrote: 
>>> Morning Bill. 
>>> 
>>> BTW< I want to say "Thanks for your help with this" first. 
>>> Just in case I forgot. 
>>> 
>>> So, I ran "check" once. It complained, and failed. 
>>> 
>> Does the failure provide any useful information? 
>> 
> No. 
> Here is what I got the first time: 
> 
> root@localhost md]# echo check >sync_action; cat mismatch_cnt 
> -bash: echo: write error: Device or resource busy 
> 0 
> 
> Later, on my second try, a few hours later, it worked, reporting no error. 
> .. 
> [maurice@localhost ~]$ su - 
> Password: 
> [root@localhost ~]# cd /sys/block/md0/md 
> [root@localhost md]# cat /proc/mdstat 
> Personalities : [raid1] [raid6] [raid5] [raid4] 
> md0 : active raid1 sda3[0] sdb2[1] 
>       386403328 blocks [2/2] [UU] 
> 
> unused devices: <none> 
> [root@localhost md]# echo check >sync_action; cat mismatch_cnt 
> 0 
> 
>> 
>> I think it's time to be keeping a good backup, and hopefully someone  
>> else has a good thought on running this down more. 
>> 
> Thanks, updated that backup at the first sign of trouble 
>>> Any thoughts on that? 
>> 
>> The only thought I have at the moment is marginal power supply, and  
>> that's just because it can generate all manner of odd behaviors,  
>> rather than any other hints. Sorry. 
>> 
> Yeah. I am going to replace *both* disks, and then run the  
> manufacturers utility (Seatest) on them. 
>> If you aren't getting errors from SMART or logs, and I don't remember  
>> you sending me that info, I'm not sure how you determine which drive  
>> is the problem. 
> Exactly. 
> 
> Thanks a LOT for trying, Bill.. 
 
Actually, my though is that you may not actually be getting hardware  
errors, which is why they are not being report by either the kernel or  
SMART. That's why I thought of memory and/or power issues, either of  
which could cause what you are seeing. 
 
Guess I have to leave it there, maybe someone else will have a thought. 
 
--  
Bill Davidsen <davidsen@tmr.com> 
  "Woe unto the statesman who makes war without a reason that will still 
  be valid when the war is over..." Otto von Bismark  
 
 
-- 
To unsubscribe from this list: send the line "unsubscribe linux-raid" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at  http://vger.kernel.org/majordomo-info.html 
 



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-04-28  7:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-13 19:14 Question: how to identify failing disk in a RAID1 Maurice Hilarius
2008-04-13 19:29 ` Justin Piszcz
2008-04-14  1:14   ` Bill Davidsen
     [not found]     ` <4802CDA2.605@harddata.com>
2008-04-14 16:38       ` Bill Davidsen
     [not found]         ` <4804CD4F.7080303@harddata.com>
2008-04-15 18:14           ` Bill Davidsen
     [not found]             ` <48050DD6.7020404@harddata.com>
     [not found]               ` <48055EFA.8060505@tmr.com>
     [not found]                 ` <480607F2.3060504@harddata.com>
2008-04-17 13:12                   ` Bill Davidsen
     [not found]                     ` <48076096.2020804@harddata.com>
2008-04-18 13:17                       ` Bill Davidsen
     [not found]     ` <480F7105.9030405@harddata.com>
2008-04-23 18:54       ` Justin Piszcz
     [not found]         ` <480F8830.6020207@harddata.com>
2008-04-23 19:26           ` Justin Piszcz
2008-04-27 17:03             ` Keith Roberts
2008-04-27 19:28               ` Richard Scobie
2008-04-28  5:29                 ` Keith Roberts
2008-04-28  6:06                   ` Michael Tokarev
2008-04-28  7:01                   ` Richard Scobie
2008-04-27 21:53               ` Mark Hahn
  -- strict thread matches above, loose matches on Subject: below --
2008-04-18 17:36 David Lethe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).