* need help: corrupt files on one of my raids @ 2009-11-10 14:07 Arild Langseid 2009-11-10 15:34 ` Majed B. 0 siblings, 1 reply; 10+ messages in thread From: Arild Langseid @ 2009-11-10 14:07 UTC (permalink / raw) To: linux-raid Hi all! I have a strange problem with corrupted files on my raid1 volume. (A raid5 volume on the same computer works just fine). One of my raids (md1) is a raid1 with two 1TB sata drives. I am running lvm on the raid and have two of the volumes on the raid are: /dev/vg0sata/lv0_bilderArchive /dev/vg0sata/lv0_bilderProjects (For your info: "bilder" in Norwegian is "pictures" in english) What I want: I want to use the lv0_bilderArchive to store my pictures unmodified and lv0_bilderProjects to hold my edited pictures and projects. My problem is: My files are corrupted. Usually the files (crw/cr2/jpg) are stored ok, but is corrupted later when new files/directories is added to the volume. Sometimes the files are corrupted instantly at save-time. I discovered this first when copying from my laptop to the server via samba. By testing I have found that this behavour also applies when I copy local on the server from raid5 (md0) to the faulty raid1(md1) with cp -a. I have tested with both reiserfs and ext3 filesystem. The file-corruption happens on both reiserfs and ext3. One of my test-procedures was as follows: 1. copied 21 pictures localy to the root of the lv0_bilderProjects volume. First 10 pictures, then 11 more by cp -a. All pictures survived and was stored non-corrupted. 2. Then I copied a whole directory-tree with cp -a to the lv0_bilderProjects volume. Many pictures was corrupted, a few stored ok. All small text-files with exif-info seems ok. All files on the volume-root copied in 1) is ok. 3. Then I copied one more directory-tree. All pictures seems ok. Mostly jpg this time. 4. Then I copied one more directory-tree, larger this time. Now the first 21 pictures in the volume-root is corrupted. All of them - and some of them in a way that my browser can't show them at all but shows an error-message. I think by my test that the samba, network and type of filesystem is not the source to my problems. I have the same problem on all lvm-volumes on the raid in question (md1). What's common and what's different on my to raids: differences on the two raid-systems: md0 (working correct) is a raid5, three ide-disks, 200GB each. md1 (corrupted files) is a raid1, two sata-disks, 1TB each. common: I use lvm on both raid-devices to host my filesystems. other useful information: I use Debian: creator:~# cat /proc/version Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-26etch1) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Nov 5 16:28:13 UTC 2009 I have run apt-get update and apt-get upgrade, and all seems to be updated. The sata disks are hosted on the motherboard: ABit NF7 The disks hosting the raid I have trouble with (md1) are Hitachi Deskstar 1TB 16MB SATA2 7200RPM, 0A38016 The output from mdadm --detail /dev/md1 and cat /proc/mdstat seems ok, but I can post the results here at request. The same applies to the output from pvdisplay, vgdisplay and lvdisplay. They seems ok, but I can post at request. Due to the time to build a 1TB raid I have not tried to use the disks in md1 without raiding them. Is it a good idea to tear the raid down and test the disks directly or does any of you have other ideas to test before I take this time consuming action? Any ideas out there? Links to information I should read? Thank heaven for my backup-routines including all copy on cold harddrives both in my safe and off location :-D Thanks for all help! Best Regards, Arild, Oslo, Norway ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: need help: corrupt files on one of my raids 2009-11-10 14:07 need help: corrupt files on one of my raids Arild Langseid @ 2009-11-10 15:34 ` Majed B. [not found] ` <4AF99159.4000800@langseid.no> 0 siblings, 1 reply; 10+ messages in thread From: Majed B. @ 2009-11-10 15:34 UTC (permalink / raw) To: Arild Langseid; +Cc: linux-raid If you have smartmontools installed, run smartctl -a /dev/sdx Look for any number that is bigger than 1 on these: Reallocated_Event_Count Current_Pending_Sector Offline_Uncorrectable UDMA_CRC_Error_Count Raw_Read_Error_Rate Reallocated_Sector_Ct Load_Retry_Count You may not have some of these. That's OK. If you don't have the package, install it, configure it to run short tests daily & long tests on weekends (on idle times). To run an immediate long test, issue this command: smartctl -t offline /dev/sdx Note: An offline test is a long test and may take up to 20 hours. An offline test is required to get the numbers for the parameters above. If you're using ext3 filesystem, it would have automatically checked for bad sectors on the time of formatting the volume. I would also suggest you run a fsck on your filesystems. On Tue, Nov 10, 2009 at 5:07 PM, Arild Langseid <arild@langseid.no> wrote: > Hi all! > > I have a strange problem with corrupted files on my raid1 volume. (A raid5 > volume on the same computer works just fine). > > One of my raids (md1) is a raid1 with two 1TB sata drives. > I am running lvm on the raid and have two of the volumes on the raid are: > /dev/vg0sata/lv0_bilderArchive > /dev/vg0sata/lv0_bilderProjects > (For your info: "bilder" in Norwegian is "pictures" in english) > > What I want: > I want to use the lv0_bilderArchive to store my pictures unmodified and > lv0_bilderProjects to hold my edited pictures and projects. > > My problem is: > > My files are corrupted. Usually the files (crw/cr2/jpg) are stored ok, but > is corrupted later when new files/directories is added to the volume. > Sometimes the files are corrupted instantly at save-time. > > I discovered this first when copying from my laptop to the server via samba. > By testing I have found that this behavour also applies when I copy local on > the server from raid5 (md0) to the faulty raid1(md1) with cp -a. > > I have tested with both reiserfs and ext3 filesystem. The file-corruption > happens on both reiserfs and ext3. > > One of my test-procedures was as follows: > 1. copied 21 pictures localy to the root of the lv0_bilderProjects volume. > First 10 pictures, then 11 more by cp -a. All pictures survived and was > stored non-corrupted. > 2. Then I copied a whole directory-tree with cp -a to the lv0_bilderProjects > volume. Many pictures was corrupted, a few stored ok. All small text-files > with exif-info seems ok. All files on the volume-root copied in 1) is ok. > 3. Then I copied one more directory-tree. All pictures seems ok. Mostly jpg > this time. > 4. Then I copied one more directory-tree, larger this time. Now the first 21 > pictures in the volume-root is corrupted. All of them - and some of them in > a way that my browser can't show them at all but shows an error-message. > > I think by my test that the samba, network and type of filesystem is not the > source to my problems. > > I have the same problem on all lvm-volumes on the raid in question (md1). > > What's common and what's different on my to raids: > > differences on the two raid-systems: > md0 (working correct) is a raid5, three ide-disks, 200GB each. > md1 (corrupted files) is a raid1, two sata-disks, 1TB each. > > common: > I use lvm on both raid-devices to host my filesystems. > > other useful information: > I use Debian: > creator:~# cat /proc/version > Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-26etch1) (dannf@debian.org) > (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Nov 5 > 16:28:13 UTC 2009 > > I have run apt-get update and apt-get upgrade, and all seems to be updated. > > The sata disks are hosted on the motherboard: ABit NF7 > The disks hosting the raid I have trouble with (md1) are Hitachi > Deskstar 1TB 16MB SATA2 7200RPM, 0A38016 > > The output from mdadm --detail /dev/md1 and cat /proc/mdstat seems ok, but I > can post the results here at request. The same applies to the output from > pvdisplay, vgdisplay and lvdisplay. They seems ok, but I can post at > request. > > Due to the time to build a 1TB raid I have not tried to use the disks in md1 > without raiding them. Is it a good idea to tear the raid down and test the > disks directly or does any of you have other ideas to test before I take > this time consuming action? > > > Any ideas out there? Links to information I should read? > > Thank heaven for my backup-routines including all copy on cold > harddrives both in my safe and off location :-D > > Thanks for all help! > > Best Regards, > Arild, Oslo, Norway > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <4AF99159.4000800@langseid.no>]
* Re: need help: corrupt files on one of my raids [not found] ` <4AF99159.4000800@langseid.no> @ 2009-11-10 18:26 ` Majed B. [not found] ` <4AF9CB19.7000803@langseid.no> 0 siblings, 1 reply; 10+ messages in thread From: Majed B. @ 2009-11-10 18:26 UTC (permalink / raw) To: linux-raid Either your motherboard doesn't support SMART or worse, your disks don't support SMART. I have a bunch if Hitachi disks that don't support SMART, which is very bad since I can't monitor their health status. Download the disk's manual and check if it has S.M.A.R.T. capabilities in it. To read more & understand what S.M.A.R.T. is, check this: http://en.wikipedia.org/wiki/S.M.A.R.T. While I was searching for your disk model, I noticed a couple of links complaining from disk failures. I didn't see whether the disk itself has SMART or not. You might want to check your motherboard's manual for SMART support as well. P.S.: Use reply-all ;) On Tue, Nov 10, 2009 at 7:14 PM, Arild Langseid <arild@langseid.no> wrote: > Hi Majed! > > Thank you for your time to help me. I have alså been thinking of hardware > fault. > > I installed smartmontools, but unfortunaly I god this result: > > creator:~# smartctl -a /dev/sdb > smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > Device: ATA Hitachi HDT72101 Version: ST6O > Serial number: STF604MH0K4X0B > Device type: disk > Local Time is: Tue Nov 10 17:43:32 2009 CET > Device does not support SMART > > Error Counter logging not supported > > [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] > Device does not support Self Test logging > creator:~# > > > Is "smart" something I has to enable? > > I have checked my bios, and did not find anything regarding smart there. > > Best Regards, > Arild > > > > Majed B. wrote: >> >> If you have smartmontools installed, run smartctl -a /dev/sdx >> >> Look for any number that is bigger than 1 on these: >> Reallocated_Event_Count >> Current_Pending_Sector >> Offline_Uncorrectable >> UDMA_CRC_Error_Count >> Raw_Read_Error_Rate >> Reallocated_Sector_Ct >> Load_Retry_Count >> >> You may not have some of these. That's OK. >> >> If you don't have the package, install it, configure it to run short >> tests daily & long tests on weekends (on idle times). >> To run an immediate long test, issue this command: smartctl -t offline >> /dev/sdx >> >> Note: An offline test is a long test and may take up to 20 hours. An >> offline test is required to get the numbers for the parameters above. >> >> If you're using ext3 filesystem, it would have automatically checked >> for bad sectors on the time of formatting the volume. >> >> I would also suggest you run a fsck on your filesystems. >> >> On Tue, Nov 10, 2009 at 5:07 PM, Arild Langseid <arild@langseid.no> wrote: >>> >>> Hi all! >>> >>> I have a strange problem with corrupted files on my raid1 volume. (A >>> raid5 >>> volume on the same computer works just fine). >>> >>> One of my raids (md1) is a raid1 with two 1TB sata drives. >>> I am running lvm on the raid and have two of the volumes on the raid are: >>> /dev/vg0sata/lv0_bilderArchive >>> /dev/vg0sata/lv0_bilderProjects >>> (For your info: "bilder" in Norwegian is "pictures" in english) >>> >>> What I want: >>> I want to use the lv0_bilderArchive to store my pictures unmodified and >>> lv0_bilderProjects to hold my edited pictures and projects. >>> >>> My problem is: >>> >>> My files are corrupted. Usually the files (crw/cr2/jpg) are stored ok, >>> but >>> is corrupted later when new files/directories is added to the volume. >>> Sometimes the files are corrupted instantly at save-time. >>> >>> I discovered this first when copying from my laptop to the server via >>> samba. >>> By testing I have found that this behavour also applies when I copy local >>> on >>> the server from raid5 (md0) to the faulty raid1(md1) with cp -a. >>> >>> I have tested with both reiserfs and ext3 filesystem. The file-corruption >>> happens on both reiserfs and ext3. >>> >>> One of my test-procedures was as follows: >>> 1. copied 21 pictures localy to the root of the lv0_bilderProjects >>> volume. >>> First 10 pictures, then 11 more by cp -a. All pictures survived and was >>> stored non-corrupted. >>> 2. Then I copied a whole directory-tree with cp -a to the >>> lv0_bilderProjects >>> volume. Many pictures was corrupted, a few stored ok. All small >>> text-files >>> with exif-info seems ok. All files on the volume-root copied in 1) is ok. >>> 3. Then I copied one more directory-tree. All pictures seems ok. Mostly >>> jpg >>> this time. >>> 4. Then I copied one more directory-tree, larger this time. Now the first >>> 21 >>> pictures in the volume-root is corrupted. All of them - and some of them >>> in >>> a way that my browser can't show them at all but shows an error-message. >>> >>> I think by my test that the samba, network and type of filesystem is not >>> the >>> source to my problems. >>> >>> I have the same problem on all lvm-volumes on the raid in question (md1). >>> >>> What's common and what's different on my to raids: >>> >>> differences on the two raid-systems: >>> md0 (working correct) is a raid5, three ide-disks, 200GB each. >>> md1 (corrupted files) is a raid1, two sata-disks, 1TB each. >>> >>> common: >>> I use lvm on both raid-devices to host my filesystems. >>> >>> other useful information: >>> I use Debian: >>> creator:~# cat /proc/version >>> Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-26etch1) >>> (dannf@debian.org) >>> (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu >>> Nov 5 >>> 16:28:13 UTC 2009 >>> >>> I have run apt-get update and apt-get upgrade, and all seems to be >>> updated. >>> >>> The sata disks are hosted on the motherboard: ABit NF7 >>> The disks hosting the raid I have trouble with (md1) are Hitachi >>> Deskstar 1TB 16MB SATA2 7200RPM, 0A38016 >>> >>> The output from mdadm --detail /dev/md1 and cat /proc/mdstat seems ok, >>> but I >>> can post the results here at request. The same applies to the output from >>> pvdisplay, vgdisplay and lvdisplay. They seems ok, but I can post at >>> request. >>> >>> Due to the time to build a 1TB raid I have not tried to use the disks in >>> md1 >>> without raiding them. Is it a good idea to tear the raid down and test >>> the >>> disks directly or does any of you have other ideas to test before I take >>> this time consuming action? >>> >>> >>> Any ideas out there? Links to information I should read? >>> >>> Thank heaven for my backup-routines including all copy on cold >>> harddrives both in my safe and off location :-D >>> >>> Thanks for all help! >>> >>> Best Regards, >>> Arild, Oslo, Norway >>> >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> >> > > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <4AF9CB19.7000803@langseid.no>]
* Re: need help: corrupt files on one of my raids [not found] ` <4AF9CB19.7000803@langseid.no> @ 2009-11-10 20:31 ` Majed B. [not found] ` <4AF9D194.6000306@langseid.no> 0 siblings, 1 reply; 10+ messages in thread From: Majed B. @ 2009-11-10 20:31 UTC (permalink / raw) To: LinuxRaid The numbers will be reported as zeros if you have never run an offline test before. Run it and then you'll get to see whether you have bad sectors or not. Have you tried running a filesystem check? (fsck) On Tue, Nov 10, 2009 at 11:20 PM, Arild Langseid <arild@langseid.no> wrote: > Hi and thanks again! > > I did not find the feature-list for my disk either. Instead I found my > smartmontools to be very old. > I upgraded my Debian Etch to Debian Lenny (took some time....), and now the > smartctl works. I got lucky about the smart feature on my disks and > motherboard. > > Output for /dev/sdb: > 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always > - 0 > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always > - 0 > 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always > - 0 > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline > - 0 > 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always > - 0 > > SMART Error Log Version: 1 > No Errors Logged > > and /dev/sdc: > 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always > - 0 > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always > - 0 > 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always > - 0 > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline > - 0 > 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always > - 0 > > SMART Error Log Version: 1 > No Errors Logged > > Seems ok to me. Do you agree? > > After upgrading to Debian Lenny - I still got corrupted files though :( > > Best Regards, > Arild > > > Majed B. wrote: >> >> Either your motherboard doesn't support SMART or worse, your disks >> don't support SMART. >> >> I have a bunch if Hitachi disks that don't support SMART, which is >> very bad since I can't monitor their health status. >> >> Download the disk's manual and check if it has S.M.A.R.T. capabilities >> in it. To read more & understand what S.M.A.R.T. is, check this: >> http://en.wikipedia.org/wiki/S.M.A.R.T. >> >> While I was searching for your disk model, I noticed a couple of links >> complaining from disk failures. I didn't see whether the disk itself >> has SMART or not. >> >> You might want to check your motherboard's manual for SMART support as >> well. >> >> P.S.: Use reply-all ;) >> >> On Tue, Nov 10, 2009 at 7:14 PM, Arild Langseid <arild@langseid.no> wrote: >> >>> >>> Hi Majed! >>> >>> Thank you for your time to help me. I have alså been thinking of hardware >>> fault. >>> >>> I installed smartmontools, but unfortunaly I god this result: >>> >>> creator:~# smartctl -a /dev/sdb >>> smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce >>> Allen >>> Home page is http://smartmontools.sourceforge.net/ >>> >>> Device: ATA Hitachi HDT72101 Version: ST6O >>> Serial number: STF604MH0K4X0B >>> Device type: disk >>> Local Time is: Tue Nov 10 17:43:32 2009 CET >>> Device does not support SMART >>> >>> Error Counter logging not supported >>> >>> [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S >>> on'] >>> Device does not support Self Test logging >>> creator:~# >>> >>> >>> Is "smart" something I has to enable? >>> >>> I have checked my bios, and did not find anything regarding smart there. >>> >>> Best Regards, >>> Arild >>> >>> >>> >>> Majed B. wrote: >>> >>>> >>>> If you have smartmontools installed, run smartctl -a /dev/sdx >>>> >>>> Look for any number that is bigger than 1 on these: >>>> Reallocated_Event_Count >>>> Current_Pending_Sector >>>> Offline_Uncorrectable >>>> UDMA_CRC_Error_Count >>>> Raw_Read_Error_Rate >>>> Reallocated_Sector_Ct >>>> Load_Retry_Count >>>> >>>> You may not have some of these. That's OK. >>>> >>>> If you don't have the package, install it, configure it to run short >>>> tests daily & long tests on weekends (on idle times). >>>> To run an immediate long test, issue this command: smartctl -t offline >>>> /dev/sdx >>>> >>>> Note: An offline test is a long test and may take up to 20 hours. An >>>> offline test is required to get the numbers for the parameters above. >>>> >>>> If you're using ext3 filesystem, it would have automatically checked >>>> for bad sectors on the time of formatting the volume. >>>> >>>> I would also suggest you run a fsck on your filesystems. >>>> >>>> On Tue, Nov 10, 2009 at 5:07 PM, Arild Langseid <arild@langseid.no> >>>> wrote: >>>> >>>>> >>>>> Hi all! >>>>> >>>>> I have a strange problem with corrupted files on my raid1 volume. (A >>>>> raid5 >>>>> volume on the same computer works just fine). >>>>> >>>>> One of my raids (md1) is a raid1 with two 1TB sata drives. >>>>> I am running lvm on the raid and have two of the volumes on the raid >>>>> are: >>>>> /dev/vg0sata/lv0_bilderArchive >>>>> /dev/vg0sata/lv0_bilderProjects >>>>> (For your info: "bilder" in Norwegian is "pictures" in english) >>>>> >>>>> What I want: >>>>> I want to use the lv0_bilderArchive to store my pictures unmodified and >>>>> lv0_bilderProjects to hold my edited pictures and projects. >>>>> >>>>> My problem is: >>>>> >>>>> My files are corrupted. Usually the files (crw/cr2/jpg) are stored ok, >>>>> but >>>>> is corrupted later when new files/directories is added to the volume. >>>>> Sometimes the files are corrupted instantly at save-time. >>>>> >>>>> I discovered this first when copying from my laptop to the server via >>>>> samba. >>>>> By testing I have found that this behavour also applies when I copy >>>>> local >>>>> on >>>>> the server from raid5 (md0) to the faulty raid1(md1) with cp -a. >>>>> >>>>> I have tested with both reiserfs and ext3 filesystem. The >>>>> file-corruption >>>>> happens on both reiserfs and ext3. >>>>> >>>>> One of my test-procedures was as follows: >>>>> 1. copied 21 pictures localy to the root of the lv0_bilderProjects >>>>> volume. >>>>> First 10 pictures, then 11 more by cp -a. All pictures survived and was >>>>> stored non-corrupted. >>>>> 2. Then I copied a whole directory-tree with cp -a to the >>>>> lv0_bilderProjects >>>>> volume. Many pictures was corrupted, a few stored ok. All small >>>>> text-files >>>>> with exif-info seems ok. All files on the volume-root copied in 1) is >>>>> ok. >>>>> 3. Then I copied one more directory-tree. All pictures seems ok. Mostly >>>>> jpg >>>>> this time. >>>>> 4. Then I copied one more directory-tree, larger this time. Now the >>>>> first >>>>> 21 >>>>> pictures in the volume-root is corrupted. All of them - and some of >>>>> them >>>>> in >>>>> a way that my browser can't show them at all but shows an >>>>> error-message. >>>>> >>>>> I think by my test that the samba, network and type of filesystem is >>>>> not >>>>> the >>>>> source to my problems. >>>>> >>>>> I have the same problem on all lvm-volumes on the raid in question >>>>> (md1). >>>>> >>>>> What's common and what's different on my to raids: >>>>> >>>>> differences on the two raid-systems: >>>>> md0 (working correct) is a raid5, three ide-disks, 200GB each. >>>>> md1 (corrupted files) is a raid1, two sata-disks, 1TB each. >>>>> >>>>> common: >>>>> I use lvm on both raid-devices to host my filesystems. >>>>> >>>>> other useful information: >>>>> I use Debian: >>>>> creator:~# cat /proc/version >>>>> Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-26etch1) >>>>> (dannf@debian.org) >>>>> (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu >>>>> Nov 5 >>>>> 16:28:13 UTC 2009 >>>>> >>>>> I have run apt-get update and apt-get upgrade, and all seems to be >>>>> updated. >>>>> >>>>> The sata disks are hosted on the motherboard: ABit NF7 >>>>> The disks hosting the raid I have trouble with (md1) are Hitachi >>>>> Deskstar 1TB 16MB SATA2 7200RPM, 0A38016 >>>>> >>>>> The output from mdadm --detail /dev/md1 and cat /proc/mdstat seems ok, >>>>> but I >>>>> can post the results here at request. The same applies to the output >>>>> from >>>>> pvdisplay, vgdisplay and lvdisplay. They seems ok, but I can post at >>>>> request. >>>>> >>>>> Due to the time to build a 1TB raid I have not tried to use the disks >>>>> in >>>>> md1 >>>>> without raiding them. Is it a good idea to tear the raid down and test >>>>> the >>>>> disks directly or does any of you have other ideas to test before I >>>>> take >>>>> this time consuming action? >>>>> >>>>> >>>>> Any ideas out there? Links to information I should read? >>>>> >>>>> Thank heaven for my backup-routines including all copy on cold >>>>> harddrives both in my safe and off location :-D >>>>> >>>>> Thanks for all help! >>>>> >>>>> Best Regards, >>>>> Arild, Oslo, Norway >>>>> >>>>> >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" >>>>> in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> > > > No virus found in this outgoing message. > Checked by AVG - www.avg.com > Version: 8.5.425 / Virus Database: 270.14.59/2494 - Release Date: 11/10/09 > 07:38:00 > > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <4AF9D194.6000306@langseid.no>]
* Re: need help: corrupt files on one of my raids [not found] ` <4AF9D194.6000306@langseid.no> @ 2009-11-11 2:29 ` Majed B. 2009-11-11 4:16 ` Michael Evans 0 siblings, 1 reply; 10+ messages in thread From: Majed B. @ 2009-11-11 2:29 UTC (permalink / raw) To: LinuxRaid If you have no data on the volumes, would you mind formatting the volumes with a different filesystem? ext3/ext4/xfs and see if you still get data corruption. On Tue, Nov 10, 2009 at 11:48 PM, Arild Langseid <arild@langseid.no> wrote: > Yes I have run a fsck: > > ########### > reiserfsck --check started at Tue Nov 10 22:15:27 2009 > ########### > Replaying journal.. > Reiserfs journal '/dev/mapper/vg0sata-lv0_multimedia' in blocks [18..8211]: > 0 transactions replayed > Checking internal tree..finished > Comparing bitmaps..finished > Checking Semantic tree: > finished > No corruptions found > There are on the filesystem: > Leaves 82 > Internal nodes 1 > Directories 7 > Other files 101 > Data block pointers 77964 (0 of them are zero) > Safe links 0 > ########### > reiserfsck finished at Tue Nov 10 22:15:29 2009 > ########### > > Seems ok to me. > > I will now run the offline checks you suggested earlier. > > As it is not very clear that I have a raid problem as I first thought.... is > there any other mailing list I should ask about my problem, or is it ok to > continue here? > > Best Regards, > Arild > > Majed B. wrote: >> >> The numbers will be reported as zeros if you have never run an offline >> test before. Run it and then you'll get to see whether you have bad >> sectors or not. >> >> Have you tried running a filesystem check? (fsck) >> >> On Tue, Nov 10, 2009 at 11:20 PM, Arild Langseid <arild@langseid.no> >> wrote: >> >>> >>> Hi and thanks again! >>> >>> I did not find the feature-list for my disk either. Instead I found my >>> smartmontools to be very old. >>> I upgraded my Debian Etch to Debian Lenny (took some time....), and now >>> the >>> smartctl works. I got lucky about the smart feature on my disks and >>> motherboard. >>> >>> Output for /dev/sdb: >>> 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always >>> - 0 >>> 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always >>> - 0 >>> 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always >>> - 0 >>> 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always >>> - 0 >>> 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline >>> - 0 >>> 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always >>> - 0 >>> >>> SMART Error Log Version: 1 >>> No Errors Logged >>> >>> and /dev/sdc: >>> 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always >>> - 0 >>> 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always >>> - 0 >>> 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always >>> - 0 >>> 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always >>> - 0 >>> 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline >>> - 0 >>> 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always >>> - 0 >>> >>> SMART Error Log Version: 1 >>> No Errors Logged >>> >>> Seems ok to me. Do you agree? >>> >>> After upgrading to Debian Lenny - I still got corrupted files though :( >>> >>> Best Regards, >>> Arild -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: need help: corrupt files on one of my raids 2009-11-11 2:29 ` Majed B. @ 2009-11-11 4:16 ` Michael Evans 2009-11-11 8:06 ` Arild Langseid 2009-11-11 8:15 ` Arild Langseid 0 siblings, 2 replies; 10+ messages in thread From: Michael Evans @ 2009-11-11 4:16 UTC (permalink / raw) To: Majed B.; +Cc: LinuxRaid One other thing besides what's already been mentioned. You are seeing issues with Raid1 and -not- your Raid5 volume. If you run mdadm -D /dev/md(whatever the number is) what version (for the superblock) is reported? Preferably you will be using either superblock 1.1 or 1.2 (preferably 1.1). The only reason to use mdadm 1.0 or 0.9 are special cases, such as for /boot style volumes. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: need help: corrupt files on one of my raids 2009-11-11 4:16 ` Michael Evans @ 2009-11-11 8:06 ` Arild Langseid 2009-11-11 8:15 ` Arild Langseid 1 sibling, 0 replies; 10+ messages in thread From: Arild Langseid @ 2009-11-11 8:06 UTC (permalink / raw) To: LinuxRaid; +Cc: Michael Evans, Majed B. Hi Michael and Majed! Thank you very much for your help! One good fellow in Australia reqognized my motherboard as the source of my problems. And sent me this privately as he does not has write-permissions to the list: As Tony writes - my problems started when I connected the seccond drive and started raiding it. The raid5 volumes is on two separate promise-ide controllers so that's the reason I have no problems with that. Tony wrote: > Hi Arild - > > Greetings from Australia... > > As I only have read access to the Raid List - I hope you do not mind my > replying to you directly. > > I understand that you are using a Abit NF7 motherboard with two Onboard > Silicon Image 3112 SATA ports (which makes it a NF7-S) to which are > connected two SATA drives - and you are having data corruption problems > with these particular drives. > > This is a known problem with this motherboard. If you connect only one > drive - no problem. However, connect two - and you have problems! > > I actually have one of these motherboards and to solve the problem I > disabled the Onboard SATA and installed a SATA controller in one of the > PCI slots. It was the quickest and easiest solution. I needed more than > two SATA ports anyway - so wasn't a problem. Just installed a 4 port card. > > If you search on the Internet there are a number of discussions on this > - examples - just a very quick search. > You may find a good fix - I took the easy way out! > > http://www.techspot.com/vb/all/windows/t-5278-SATA-RAID-data-corruption-problem-update.html > > http://www.nforcershq.com/forum/image-vp65255.html > http://www.tomshardware.com/forum/103163-30-bios-update-problem > > Good Luck, Tony > Thanks you very much for all the help. Best Regards, Arild Michael Evans wrote: > One other thing besides what's already been mentioned. You are seeing > issues with Raid1 and -not- your Raid5 volume. If you run mdadm -D > /dev/md(whatever the number is) what version (for the superblock) is > reported? Preferably you will be using either superblock 1.1 or 1.2 > (preferably 1.1). The only reason to use mdadm 1.0 or 0.9 are special > cases, such as for /boot style volumes. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: need help: corrupt files on one of my raids 2009-11-11 4:16 ` Michael Evans 2009-11-11 8:06 ` Arild Langseid @ 2009-11-11 8:15 ` Arild Langseid 2009-11-11 8:25 ` Leslie Rhorer 2009-11-11 8:31 ` Michael Evans 1 sibling, 2 replies; 10+ messages in thread From: Arild Langseid @ 2009-11-11 8:15 UTC (permalink / raw) To: Michael Evans; +Cc: LinuxRaid Hi Michael: I still ran your advice and got this result: creator:~# mdadm -D /dev/md1 /dev/md1: Version : 00.90 Is it a big problem running version 0.9? I use the version of the tools that comes with Debian - and as they are "some" conservative their versions lags some behind. I have now upgraded my Debian Etch to Debian Lenny and have this mdadm: creator:~# mdadm --version mdadm - v2.6.7.2 - 14th November 2008 When I fix the firmware issues on my motherboard or buys a sepparate sata controller.... will the superblock be one of the version you suggests when I create the raid again? If not is it that large problem and I should upgrade my tools beyond what Debian provides? Best Regards, Arild Michael Evans wrote: > One other thing besides what's already been mentioned. You are seeing > issues with Raid1 and -not- your Raid5 volume. If you run mdadm -D > /dev/md(whatever the number is) what version (for the superblock) is > reported? Preferably you will be using either superblock 1.1 or 1.2 > (preferably 1.1). The only reason to use mdadm 1.0 or 0.9 are special > cases, such as for /boot style volumes. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: need help: corrupt files on one of my raids 2009-11-11 8:15 ` Arild Langseid @ 2009-11-11 8:25 ` Leslie Rhorer 2009-11-11 8:31 ` Michael Evans 1 sibling, 0 replies; 10+ messages in thread From: Leslie Rhorer @ 2009-11-11 8:25 UTC (permalink / raw) To: linux-raid > creator:~# mdadm -D /dev/md1 > /dev/md1: > Version : 00.90 > > Is it a big problem running version 0.9? I use the version of the tools > that comes with Debian - and as they are "some" conservative their > versions lags some behind. Debian is very conservative, but that's not the issue, here. > I have now upgraded my Debian Etch to Debian Lenny and have this mdadm: > creator:~# mdadm --version > mdadm - v2.6.7.2 - 14th November 2008 > > When I fix the firmware issues on my motherboard or buys a sepparate > sata controller.... will the superblock be one of the version you > suggests when I create the raid again? It will, but not only with Debian. All current releases of mdadm still default to a 0.9 superblock. Neil has been talking about changing that, but as of now it is still true for all versions of mdadm. I think it was even true for mdadm 3.1, which was withdrawn. > If not is it that large problem > and I should upgrade my tools beyond what Debian provides? There's nothing truly horrible about a 0.9 superblock, unless your RAID array is going to grow to be quite large. The existing tools allow a version 1.x superblock, but you must select the version yourself when you create the array. If you don't, it will default to 0.9. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: need help: corrupt files on one of my raids 2009-11-11 8:15 ` Arild Langseid 2009-11-11 8:25 ` Leslie Rhorer @ 2009-11-11 8:31 ` Michael Evans 1 sibling, 0 replies; 10+ messages in thread From: Michael Evans @ 2009-11-11 8:31 UTC (permalink / raw) To: Arild Langseid; +Cc: LinuxRaid Besides the man-page documented issues with 0.90 superblocks is the fact that they are stored at the -end- of the partition. This is a good thing for boot-loaders like grub, which would like read-only access to partitions to look at file-systems and read data. However at the same time it is a -VERY- bad thing, because each member of that raid1 array looks like it's own file-system. Think about what happens when one copy of that file-system is changed and the other(s) is(are) not. If you're careful, or don't care about the data in that partition (the data in /boot is generally very -nice- to have, but the system can be recovered and that data regenerated in one form or another), then using the 0.9 format (or 1.0 for that matter) superblock is perfectly fine. Yet at the same time it's so easy to forget to force a device rebuild the next time the array is assembled, and to remember to check if your recovery CD/etc happened to start the array before mounting things, or if it got mounted without being part of the raid-1 set. This is why I mentioned using 0.9 / 1.0 only for special cases like /boot. 1.1 and 1.2 are "better" because they are both at the front of the partition and make it look nothing like a file-system that can be mounted until assembled in to a raid array. On Wed, Nov 11, 2009 at 12:15 AM, Arild Langseid <arild@langseid.no> wrote: > Hi Michael: > > I still ran your advice and got this result: > > creator:~# mdadm -D /dev/md1 > /dev/md1: > Version : 00.90 > > Is it a big problem running version 0.9? I use the version of the tools that > comes with Debian - and as they are "some" conservative their versions lags > some behind. > > I have now upgraded my Debian Etch to Debian Lenny and have this mdadm: > creator:~# mdadm --version > mdadm - v2.6.7.2 - 14th November 2008 > > When I fix the firmware issues on my motherboard or buys a sepparate sata > controller.... will the superblock be one of the version you suggests when I > create the raid again? If not is it that large problem and I should upgrade > my tools beyond what Debian provides? > > Best Regards, > Arild > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-11-11 8:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-10 14:07 need help: corrupt files on one of my raids Arild Langseid
2009-11-10 15:34 ` Majed B.
[not found] ` <4AF99159.4000800@langseid.no>
2009-11-10 18:26 ` Majed B.
[not found] ` <4AF9CB19.7000803@langseid.no>
2009-11-10 20:31 ` Majed B.
[not found] ` <4AF9D194.6000306@langseid.no>
2009-11-11 2:29 ` Majed B.
2009-11-11 4:16 ` Michael Evans
2009-11-11 8:06 ` Arild Langseid
2009-11-11 8:15 ` Arild Langseid
2009-11-11 8:25 ` Leslie Rhorer
2009-11-11 8:31 ` Michael Evans
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).