need help: corrupt files on one of my raids

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* need help: corrupt files on one of my raids
@ 2009-11-10 14:07 Arild Langseid
  2009-11-10 15:34 ` Majed B.
  0 siblings, 1 reply; 10+ messages in thread
From: Arild Langseid @ 2009-11-10 14:07 UTC (permalink / raw)
  To: linux-raid

Hi all!

I have a strange problem with corrupted files on my raid1 volume. (A 
raid5 volume on the same computer works just fine).

One of my raids (md1) is a raid1 with two 1TB sata drives.
I am running lvm on the raid and have two of the volumes on the raid are:
/dev/vg0sata/lv0_bilderArchive
/dev/vg0sata/lv0_bilderProjects
(For your info: "bilder" in Norwegian is "pictures" in english)

What I want:
I want to use the lv0_bilderArchive to store my pictures unmodified and 
lv0_bilderProjects to hold my edited pictures and projects.

My problem is:

My files are corrupted. Usually the files (crw/cr2/jpg) are stored ok, 
but is corrupted later when new files/directories is added to the 
volume. Sometimes the files are corrupted instantly at save-time.

I discovered this first when copying from my laptop to the server via 
samba. By testing I have found that this behavour also applies when I 
copy local on the server from raid5 (md0) to the faulty raid1(md1) with 
cp -a.

I have tested with both reiserfs and ext3 filesystem. The 
file-corruption happens on both reiserfs and ext3.

One of my test-procedures was as follows:
1. copied 21 pictures localy to the root of the lv0_bilderProjects 
volume. First 10 pictures, then 11 more by cp -a. All pictures survived 
and was stored non-corrupted.
2. Then I copied a whole directory-tree with cp -a to the 
lv0_bilderProjects volume. Many pictures was corrupted, a few stored ok. 
All small text-files with exif-info seems ok. All files on the 
volume-root copied in 1) is ok.
3. Then I copied one more directory-tree. All pictures seems ok. Mostly 
jpg this time.
4. Then I copied one more directory-tree, larger this time. Now the 
first 21 pictures in the volume-root is corrupted. All of them - and 
some of them in a way that my browser can't show them at all but shows 
an error-message.

I think by my test that the samba, network and type of filesystem is not 
the source to my problems.

I have the same problem on all lvm-volumes on the raid in question (md1).

What's common and what's different on my to raids:

differences on the two raid-systems:
md0 (working correct) is a raid5, three ide-disks, 200GB each.
md1 (corrupted files) is a raid1, two sata-disks, 1TB each.

common:
I use lvm on both raid-devices to host my filesystems.

other useful information:
I use Debian:
creator:~# cat /proc/version
Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-26etch1) 
(dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 
4.1.1-21)) #1 SMP Thu Nov 5 16:28:13 UTC 2009

I have run apt-get update and apt-get upgrade, and all seems to be updated.

The sata disks are hosted on the motherboard: ABit NF7
The disks hosting the raid I have trouble with (md1) are Hitachi
Deskstar 1TB 16MB SATA2 7200RPM, 0A38016

The output from mdadm --detail /dev/md1 and cat /proc/mdstat seems ok, 
but I can post the results here at request. The same applies to the 
output from pvdisplay, vgdisplay and lvdisplay. They seems ok, but I can 
post at request.

Due to the time to build a 1TB raid I have not tried to use the disks in 
md1 without raiding them. Is it a good idea to tear the raid down and 
test the disks directly or does any of you have other ideas to test 
before I take this time consuming action?

Any ideas out there? Links to information I should read?

Thank heaven for my backup-routines including all copy on cold
harddrives both in my safe and off location :-D

Thanks for all help!

Best Regards,
Arild, Oslo, Norway

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: need help: corrupt files on one of my raids
  2009-11-10 14:07 need help: corrupt files on one of my raids Arild Langseid
@ 2009-11-10 15:34 ` Majed B.
       [not found]   ` <4AF99159.4000800@langseid.no>
  0 siblings, 1 reply; 10+ messages in thread
From: Majed B. @ 2009-11-10 15:34 UTC (permalink / raw)
  To: Arild Langseid; +Cc: linux-raid

If you have smartmontools installed, run smartctl -a /dev/sdx

Look for any number that is bigger than 1 on these:
Reallocated_Event_Count
Current_Pending_Sector
Offline_Uncorrectable
UDMA_CRC_Error_Count
Raw_Read_Error_Rate
Reallocated_Sector_Ct
Load_Retry_Count

You may not have some of these. That's OK.

If you don't have the package, install it, configure it to run short
tests daily & long tests on weekends (on idle times).
To run an immediate long test, issue this command: smartctl -t offline /dev/sdx

Note: An offline test is a long test and may take up to 20 hours. An
offline test is required to get the numbers for the parameters above.

If you're using ext3 filesystem, it would have automatically checked
for bad sectors on the time of formatting the volume.

I would also suggest you run a fsck on your filesystems.

On Tue, Nov 10, 2009 at 5:07 PM, Arild Langseid <arild@langseid.no> wrote:
> Hi all!
>
> I have a strange problem with corrupted files on my raid1 volume. (A raid5
> volume on the same computer works just fine).
>
> One of my raids (md1) is a raid1 with two 1TB sata drives.
> I am running lvm on the raid and have two of the volumes on the raid are:
> /dev/vg0sata/lv0_bilderArchive
> /dev/vg0sata/lv0_bilderProjects
> (For your info: "bilder" in Norwegian is "pictures" in english)
>
> What I want:
> I want to use the lv0_bilderArchive to store my pictures unmodified and
> lv0_bilderProjects to hold my edited pictures and projects.
>
> My problem is:
>
> My files are corrupted. Usually the files (crw/cr2/jpg) are stored ok, but
> is corrupted later when new files/directories is added to the volume.
> Sometimes the files are corrupted instantly at save-time.
>
> I discovered this first when copying from my laptop to the server via samba.
> By testing I have found that this behavour also applies when I copy local on
> the server from raid5 (md0) to the faulty raid1(md1) with cp -a.
>
> I have tested with both reiserfs and ext3 filesystem. The file-corruption
> happens on both reiserfs and ext3.
>
> One of my test-procedures was as follows:
> 1. copied 21 pictures localy to the root of the lv0_bilderProjects volume.
> First 10 pictures, then 11 more by cp -a. All pictures survived and was
> stored non-corrupted.
> 2. Then I copied a whole directory-tree with cp -a to the lv0_bilderProjects
> volume. Many pictures was corrupted, a few stored ok. All small text-files
> with exif-info seems ok. All files on the volume-root copied in 1) is ok.
> 3. Then I copied one more directory-tree. All pictures seems ok. Mostly jpg
> this time.
> 4. Then I copied one more directory-tree, larger this time. Now the first 21
> pictures in the volume-root is corrupted. All of them - and some of them in
> a way that my browser can't show them at all but shows an error-message.
>
> I think by my test that the samba, network and type of filesystem is not the
> source to my problems.
>
> I have the same problem on all lvm-volumes on the raid in question (md1).
>
> What's common and what's different on my to raids:
>
> differences on the two raid-systems:
> md0 (working correct) is a raid5, three ide-disks, 200GB each.
> md1 (corrupted files) is a raid1, two sata-disks, 1TB each.
>
> common:
> I use lvm on both raid-devices to host my filesystems.
>
> other useful information:
> I use Debian:
> creator:~# cat /proc/version
> Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-26etch1) (dannf@debian.org)
> (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Nov 5
> 16:28:13 UTC 2009
>
> I have run apt-get update and apt-get upgrade, and all seems to be updated.
>
> The sata disks are hosted on the motherboard: ABit NF7
> The disks hosting the raid I have trouble with (md1) are Hitachi
> Deskstar 1TB 16MB SATA2 7200RPM, 0A38016
>
> The output from mdadm --detail /dev/md1 and cat /proc/mdstat seems ok, but I
> can post the results here at request. The same applies to the output from
> pvdisplay, vgdisplay and lvdisplay. They seems ok, but I can post at
> request.
>
> Due to the time to build a 1TB raid I have not tried to use the disks in md1
> without raiding them. Is it a good idea to tear the raid down and test the
> disks directly or does any of you have other ideas to test before I take
> this time consuming action?
>
>
> Any ideas out there? Links to information I should read?
>
> Thank heaven for my backup-routines including all copy on cold
> harddrives both in my safe and off location :-D
>
> Thanks for all help!
>
> Best Regards,
> Arild, Oslo, Norway
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <4AF99159.4000800@langseid.no>]

* Re: need help: corrupt files on one of my raids
       [not found]   ` <4AF99159.4000800@langseid.no>
@ 2009-11-10 18:26     ` Majed B.
       [not found]       ` <4AF9CB19.7000803@langseid.no>
  0 siblings, 1 reply; 10+ messages in thread
From: Majed B. @ 2009-11-10 18:26 UTC (permalink / raw)
  To: linux-raid

Either your motherboard doesn't support SMART or worse, your disks
don't support SMART.

I have a bunch if Hitachi disks that don't support SMART, which is
very bad since I can't monitor their health status.

Download the disk's manual and check if it has S.M.A.R.T. capabilities
in it. To read more & understand what S.M.A.R.T. is, check this:
http://en.wikipedia.org/wiki/S.M.A.R.T.

While I was searching for your disk model, I noticed a couple of links
complaining from disk failures. I didn't see whether the disk itself
has SMART or not.

You might want to check your motherboard's manual for SMART support as well.

P.S.: Use reply-all ;)

On Tue, Nov 10, 2009 at 7:14 PM, Arild Langseid <arild@langseid.no> wrote:
> Hi Majed!
>
> Thank you for your time to help me. I have alså been thinking of hardware
> fault.
>
> I installed smartmontools, but unfortunaly I god this result:
>
> creator:~# smartctl -a /dev/sdb
> smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> Device: ATA      Hitachi HDT72101 Version: ST6O
> Serial number:       STF604MH0K4X0B
> Device type: disk
> Local Time is: Tue Nov 10 17:43:32 2009 CET
> Device does not support SMART
>
> Error Counter logging not supported
>
> [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
> Device does not support Self Test logging
> creator:~#
>
>
> Is "smart" something I has to enable?
>
> I have checked my bios, and did not find anything regarding smart there.
>
> Best Regards,
> Arild
>
>
>
> Majed B. wrote:
>>
>> If you have smartmontools installed, run smartctl -a /dev/sdx
>>
>> Look for any number that is bigger than 1 on these:
>> Reallocated_Event_Count
>> Current_Pending_Sector
>> Offline_Uncorrectable
>> UDMA_CRC_Error_Count
>> Raw_Read_Error_Rate
>> Reallocated_Sector_Ct
>> Load_Retry_Count
>>
>> You may not have some of these. That's OK.
>>
>> If you don't have the package, install it, configure it to run short
>> tests daily & long tests on weekends (on idle times).
>> To run an immediate long test, issue this command: smartctl -t offline
>> /dev/sdx
>>
>> Note: An offline test is a long test and may take up to 20 hours. An
>> offline test is required to get the numbers for the parameters above.
>>
>> If you're using ext3 filesystem, it would have automatically checked
>> for bad sectors on the time of formatting the volume.
>>
>> I would also suggest you run a fsck on your filesystems.
>>
>> On Tue, Nov 10, 2009 at 5:07 PM, Arild Langseid <arild@langseid.no> wrote:
>>>
>>> Hi all!
>>>
>>> I have a strange problem with corrupted files on my raid1 volume. (A
>>> raid5
>>> volume on the same computer works just fine).
>>>
>>> One of my raids (md1) is a raid1 with two 1TB sata drives.
>>> I am running lvm on the raid and have two of the volumes on the raid are:
>>> /dev/vg0sata/lv0_bilderArchive
>>> /dev/vg0sata/lv0_bilderProjects
>>> (For your info: "bilder" in Norwegian is "pictures" in english)
>>>
>>> What I want:
>>> I want to use the lv0_bilderArchive to store my pictures unmodified and
>>> lv0_bilderProjects to hold my edited pictures and projects.
>>>
>>> My problem is:
>>>
>>> My files are corrupted. Usually the files (crw/cr2/jpg) are stored ok,
>>> but
>>> is corrupted later when new files/directories is added to the volume.
>>> Sometimes the files are corrupted instantly at save-time.
>>>
>>> I discovered this first when copying from my laptop to the server via
>>> samba.
>>> By testing I have found that this behavour also applies when I copy local
>>> on
>>> the server from raid5 (md0) to the faulty raid1(md1) with cp -a.
>>>
>>> I have tested with both reiserfs and ext3 filesystem. The file-corruption
>>> happens on both reiserfs and ext3.
>>>
>>> One of my test-procedures was as follows:
>>> 1. copied 21 pictures localy to the root of the lv0_bilderProjects
>>> volume.
>>> First 10 pictures, then 11 more by cp -a. All pictures survived and was
>>> stored non-corrupted.
>>> 2. Then I copied a whole directory-tree with cp -a to the
>>> lv0_bilderProjects
>>> volume. Many pictures was corrupted, a few stored ok. All small
>>> text-files
>>> with exif-info seems ok. All files on the volume-root copied in 1) is ok.
>>> 3. Then I copied one more directory-tree. All pictures seems ok. Mostly
>>> jpg
>>> this time.
>>> 4. Then I copied one more directory-tree, larger this time. Now the first
>>> 21
>>> pictures in the volume-root is corrupted. All of them - and some of them
>>> in
>>> a way that my browser can't show them at all but shows an error-message.
>>>
>>> I think by my test that the samba, network and type of filesystem is not
>>> the
>>> source to my problems.
>>>
>>> I have the same problem on all lvm-volumes on the raid in question (md1).
>>>
>>> What's common and what's different on my to raids:
>>>
>>> differences on the two raid-systems:
>>> md0 (working correct) is a raid5, three ide-disks, 200GB each.
>>> md1 (corrupted files) is a raid1, two sata-disks, 1TB each.
>>>
>>> common:
>>> I use lvm on both raid-devices to host my filesystems.
>>>
>>> other useful information:
>>> I use Debian:
>>> creator:~# cat /proc/version
>>> Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-26etch1)
>>> (dannf@debian.org)
>>> (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu
>>> Nov 5
>>> 16:28:13 UTC 2009
>>>
>>> I have run apt-get update and apt-get upgrade, and all seems to be
>>> updated.
>>>
>>> The sata disks are hosted on the motherboard: ABit NF7
>>> The disks hosting the raid I have trouble with (md1) are Hitachi
>>> Deskstar 1TB 16MB SATA2 7200RPM, 0A38016
>>>
>>> The output from mdadm --detail /dev/md1 and cat /proc/mdstat seems ok,
>>> but I
>>> can post the results here at request. The same applies to the output from
>>> pvdisplay, vgdisplay and lvdisplay. They seems ok, but I can post at
>>> request.
>>>
>>> Due to the time to build a 1TB raid I have not tried to use the disks in
>>> md1
>>> without raiding them. Is it a good idea to tear the raid down and test
>>> the
>>> disks directly or does any of you have other ideas to test before I take
>>> this time consuming action?
>>>
>>>
>>> Any ideas out there? Links to information I should read?
>>>
>>> Thank heaven for my backup-routines including all copy on cold
>>> harddrives both in my safe and off location :-D
>>>
>>> Thanks for all help!
>>>
>>> Best Regards,
>>> Arild, Oslo, Norway
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <4AF9CB19.7000803@langseid.no>]

* Re: need help: corrupt files on one of my raids
       [not found]       ` <4AF9CB19.7000803@langseid.no>
@ 2009-11-10 20:31         ` Majed B.
       [not found]           ` <4AF9D194.6000306@langseid.no>
  0 siblings, 1 reply; 10+ messages in thread
From: Majed B. @ 2009-11-10 20:31 UTC (permalink / raw)
  To: LinuxRaid

The numbers will be reported as zeros if you have never run an offline
test before. Run it and then you'll get to see whether you have bad
sectors or not.

Have you tried running a filesystem check? (fsck)

On Tue, Nov 10, 2009 at 11:20 PM, Arild Langseid <arild@langseid.no> wrote:
> Hi and thanks again!
>
> I did not find the feature-list for my disk either. Instead I found my
> smartmontools to be very old.
> I upgraded my Debian Etch to Debian Lenny (took some time....), and now the
> smartctl works. I got lucky about the smart feature on my disks and
> motherboard.
>
> Output for /dev/sdb:
>  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always
>   -       0
>  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always
>   -       0
> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always
>   -       0
> 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always
>   -       0
> 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline
>    -       0
> 199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always
>   -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> and /dev/sdc:
>  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always
>   -       0
>  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always
>   -       0
> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always
>   -       0
> 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always
>   -       0
> 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline
>    -       0
> 199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always
>   -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> Seems ok to me. Do you agree?
>
> After upgrading to Debian Lenny - I still got corrupted files though :(
>
> Best Regards,
> Arild
>
>
> Majed B. wrote:
>>
>> Either your motherboard doesn't support SMART or worse, your disks
>> don't support SMART.
>>
>> I have a bunch if Hitachi disks that don't support SMART, which is
>> very bad since I can't monitor their health status.
>>
>> Download the disk's manual and check if it has S.M.A.R.T. capabilities
>> in it. To read more & understand what S.M.A.R.T. is, check this:
>> http://en.wikipedia.org/wiki/S.M.A.R.T.
>>
>> While I was searching for your disk model, I noticed a couple of links
>> complaining from disk failures. I didn't see whether the disk itself
>> has SMART or not.
>>
>> You might want to check your motherboard's manual for SMART support as
>> well.
>>
>> P.S.: Use reply-all ;)
>>
>> On Tue, Nov 10, 2009 at 7:14 PM, Arild Langseid <arild@langseid.no> wrote:
>>
>>>
>>> Hi Majed!
>>>
>>> Thank you for your time to help me. I have alså been thinking of hardware
>>> fault.
>>>
>>> I installed smartmontools, but unfortunaly I god this result:
>>>
>>> creator:~# smartctl -a /dev/sdb
>>> smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce
>>> Allen
>>> Home page is http://smartmontools.sourceforge.net/
>>>
>>> Device: ATA      Hitachi HDT72101 Version: ST6O
>>> Serial number:       STF604MH0K4X0B
>>> Device type: disk
>>> Local Time is: Tue Nov 10 17:43:32 2009 CET
>>> Device does not support SMART
>>>
>>> Error Counter logging not supported
>>>
>>> [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S
>>> on']
>>> Device does not support Self Test logging
>>> creator:~#
>>>
>>>
>>> Is "smart" something I has to enable?
>>>
>>> I have checked my bios, and did not find anything regarding smart there.
>>>
>>> Best Regards,
>>> Arild
>>>
>>>
>>>
>>> Majed B. wrote:
>>>
>>>>
>>>> If you have smartmontools installed, run smartctl -a /dev/sdx
>>>>
>>>> Look for any number that is bigger than 1 on these:
>>>> Reallocated_Event_Count
>>>> Current_Pending_Sector
>>>> Offline_Uncorrectable
>>>> UDMA_CRC_Error_Count
>>>> Raw_Read_Error_Rate
>>>> Reallocated_Sector_Ct
>>>> Load_Retry_Count
>>>>
>>>> You may not have some of these. That's OK.
>>>>
>>>> If you don't have the package, install it, configure it to run short
>>>> tests daily & long tests on weekends (on idle times).
>>>> To run an immediate long test, issue this command: smartctl -t offline
>>>> /dev/sdx
>>>>
>>>> Note: An offline test is a long test and may take up to 20 hours. An
>>>> offline test is required to get the numbers for the parameters above.
>>>>
>>>> If you're using ext3 filesystem, it would have automatically checked
>>>> for bad sectors on the time of formatting the volume.
>>>>
>>>> I would also suggest you run a fsck on your filesystems.
>>>>
>>>> On Tue, Nov 10, 2009 at 5:07 PM, Arild Langseid <arild@langseid.no>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi all!
>>>>>
>>>>> I have a strange problem with corrupted files on my raid1 volume. (A
>>>>> raid5
>>>>> volume on the same computer works just fine).
>>>>>
>>>>> One of my raids (md1) is a raid1 with two 1TB sata drives.
>>>>> I am running lvm on the raid and have two of the volumes on the raid
>>>>> are:
>>>>> /dev/vg0sata/lv0_bilderArchive
>>>>> /dev/vg0sata/lv0_bilderProjects
>>>>> (For your info: "bilder" in Norwegian is "pictures" in english)
>>>>>
>>>>> What I want:
>>>>> I want to use the lv0_bilderArchive to store my pictures unmodified and
>>>>> lv0_bilderProjects to hold my edited pictures and projects.
>>>>>
>>>>> My problem is:
>>>>>
>>>>> My files are corrupted. Usually the files (crw/cr2/jpg) are stored ok,
>>>>> but
>>>>> is corrupted later when new files/directories is added to the volume.
>>>>> Sometimes the files are corrupted instantly at save-time.
>>>>>
>>>>> I discovered this first when copying from my laptop to the server via
>>>>> samba.
>>>>> By testing I have found that this behavour also applies when I copy
>>>>> local
>>>>> on
>>>>> the server from raid5 (md0) to the faulty raid1(md1) with cp -a.
>>>>>
>>>>> I have tested with both reiserfs and ext3 filesystem. The
>>>>> file-corruption
>>>>> happens on both reiserfs and ext3.
>>>>>
>>>>> One of my test-procedures was as follows:
>>>>> 1. copied 21 pictures localy to the root of the lv0_bilderProjects
>>>>> volume.
>>>>> First 10 pictures, then 11 more by cp -a. All pictures survived and was
>>>>> stored non-corrupted.
>>>>> 2. Then I copied a whole directory-tree with cp -a to the
>>>>> lv0_bilderProjects
>>>>> volume. Many pictures was corrupted, a few stored ok. All small
>>>>> text-files
>>>>> with exif-info seems ok. All files on the volume-root copied in 1) is
>>>>> ok.
>>>>> 3. Then I copied one more directory-tree. All pictures seems ok. Mostly
>>>>> jpg
>>>>> this time.
>>>>> 4. Then I copied one more directory-tree, larger this time. Now the
>>>>> first
>>>>> 21
>>>>> pictures in the volume-root is corrupted. All of them - and some of
>>>>> them
>>>>> in
>>>>> a way that my browser can't show them at all but shows an
>>>>> error-message.
>>>>>
>>>>> I think by my test that the samba, network and type of filesystem is
>>>>> not
>>>>> the
>>>>> source to my problems.
>>>>>
>>>>> I have the same problem on all lvm-volumes on the raid in question
>>>>> (md1).
>>>>>
>>>>> What's common and what's different on my to raids:
>>>>>
>>>>> differences on the two raid-systems:
>>>>> md0 (working correct) is a raid5, three ide-disks, 200GB each.
>>>>> md1 (corrupted files) is a raid1, two sata-disks, 1TB each.
>>>>>
>>>>> common:
>>>>> I use lvm on both raid-devices to host my filesystems.
>>>>>
>>>>> other useful information:
>>>>> I use Debian:
>>>>> creator:~# cat /proc/version
>>>>> Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-26etch1)
>>>>> (dannf@debian.org)
>>>>> (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu
>>>>> Nov 5
>>>>> 16:28:13 UTC 2009
>>>>>
>>>>> I have run apt-get update and apt-get upgrade, and all seems to be
>>>>> updated.
>>>>>
>>>>> The sata disks are hosted on the motherboard: ABit NF7
>>>>> The disks hosting the raid I have trouble with (md1) are Hitachi
>>>>> Deskstar 1TB 16MB SATA2 7200RPM, 0A38016
>>>>>
>>>>> The output from mdadm --detail /dev/md1 and cat /proc/mdstat seems ok,
>>>>> but I
>>>>> can post the results here at request. The same applies to the output
>>>>> from
>>>>> pvdisplay, vgdisplay and lvdisplay. They seems ok, but I can post at
>>>>> request.
>>>>>
>>>>> Due to the time to build a 1TB raid I have not tried to use the disks
>>>>> in
>>>>> md1
>>>>> without raiding them. Is it a good idea to tear the raid down and test
>>>>> the
>>>>> disks directly or does any of you have other ideas to test before I
>>>>> take
>>>>> this time consuming action?
>>>>>
>>>>>
>>>>> Any ideas out there? Links to information I should read?
>>>>>
>>>>> Thank heaven for my backup-routines including all copy on cold
>>>>> harddrives both in my safe and off location :-D
>>>>>
>>>>> Thanks for all help!
>>>>>
>>>>> Best Regards,
>>>>> Arild, Oslo, Norway
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid"
>>>>> in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>
> No virus found in this outgoing message.
> Checked by AVG - www.avg.com
> Version: 8.5.425 / Virus Database: 270.14.59/2494 - Release Date: 11/10/09
> 07:38:00
>
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <4AF9D194.6000306@langseid.no>]

* Re: need help: corrupt files on one of my raids
       [not found]           ` <4AF9D194.6000306@langseid.no>
@ 2009-11-11  2:29             ` Majed B.
  2009-11-11  4:16               ` Michael Evans
  0 siblings, 1 reply; 10+ messages in thread
From: Majed B. @ 2009-11-11  2:29 UTC (permalink / raw)
  To: LinuxRaid

If you have no data on the volumes, would you mind formatting the
volumes with a different filesystem? ext3/ext4/xfs and see if you
still get data corruption.

On Tue, Nov 10, 2009 at 11:48 PM, Arild Langseid <arild@langseid.no> wrote:
> Yes I have run a fsck:
>
> ###########
> reiserfsck --check started at Tue Nov 10 22:15:27 2009
> ###########
> Replaying journal..
> Reiserfs journal '/dev/mapper/vg0sata-lv0_multimedia' in blocks [18..8211]:
> 0 transactions replayed
> Checking internal tree..finished
> Comparing bitmaps..finished
> Checking Semantic tree:
> finished
> No corruptions found
> There are on the filesystem:
>       Leaves 82
>       Internal nodes 1
>       Directories 7
>       Other files 101
>       Data block pointers 77964 (0 of them are zero)
>       Safe links 0
> ###########
> reiserfsck finished at Tue Nov 10 22:15:29 2009
> ###########
>
> Seems ok to me.
>
> I will now run the offline checks you suggested earlier.
>
> As it is not very clear that I have a raid problem as I first thought.... is
> there any other mailing list I should ask about my problem, or is it ok to
> continue here?
>
> Best Regards,
> Arild
>
> Majed B. wrote:
>>
>> The numbers will be reported as zeros if you have never run an offline
>> test before. Run it and then you'll get to see whether you have bad
>> sectors or not.
>>
>> Have you tried running a filesystem check? (fsck)
>>
>> On Tue, Nov 10, 2009 at 11:20 PM, Arild Langseid <arild@langseid.no>
>> wrote:
>>
>>>
>>> Hi and thanks again!
>>>
>>> I did not find the feature-list for my disk either. Instead I found my
>>> smartmontools to be very old.
>>> I upgraded my Debian Etch to Debian Lenny (took some time....), and now
>>> the
>>> smartctl works. I got lucky about the smart feature on my disks and
>>> motherboard.
>>>
>>> Output for /dev/sdb:
>>>  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always
>>>  -       0
>>>  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always
>>>  -       0
>>> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always
>>>  -       0
>>> 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always
>>>  -       0
>>> 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline
>>>   -       0
>>> 199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always
>>>  -       0
>>>
>>> SMART Error Log Version: 1
>>> No Errors Logged
>>>
>>> and /dev/sdc:
>>>  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always
>>>  -       0
>>>  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always
>>>  -       0
>>> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always
>>>  -       0
>>> 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always
>>>  -       0
>>> 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline
>>>   -       0
>>> 199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always
>>>  -       0
>>>
>>> SMART Error Log Version: 1
>>> No Errors Logged
>>>
>>> Seems ok to me. Do you agree?
>>>
>>> After upgrading to Debian Lenny - I still got corrupted files though :(
>>>
>>> Best Regards,
>>> Arild
-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: need help: corrupt files on one of my raids
  2009-11-11  2:29             ` Majed B.
@ 2009-11-11  4:16               ` Michael Evans
  2009-11-11  8:06                 ` Arild Langseid
  2009-11-11  8:15                 ` Arild Langseid
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Evans @ 2009-11-11  4:16 UTC (permalink / raw)
  To: Majed B.; +Cc: LinuxRaid

One other thing besides what's already been mentioned.  You are seeing
issues with Raid1 and -not- your Raid5 volume.  If you run mdadm -D
/dev/md(whatever the number is) what version (for the superblock) is
reported?  Preferably you will be using either superblock 1.1 or 1.2
(preferably 1.1).  The only reason to use mdadm 1.0 or 0.9 are special
cases, such as for /boot style volumes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: need help: corrupt files on one of my raids
  2009-11-11  4:16               ` Michael Evans
@ 2009-11-11  8:06                 ` Arild Langseid
  2009-11-11  8:15                 ` Arild Langseid
  1 sibling, 0 replies; 10+ messages in thread
From: Arild Langseid @ 2009-11-11  8:06 UTC (permalink / raw)
  To: LinuxRaid; +Cc: Michael Evans, Majed B.

Hi Michael and Majed!

Thank you very much for your help!

One good fellow in Australia reqognized my motherboard as the source of 
my problems. And sent me this privately as he does not has 
write-permissions to the list:

As Tony writes - my problems started when I connected the seccond drive 
and started raiding it. The raid5 volumes is on two separate promise-ide 
controllers so that's the reason I have no problems with that.



Tony wrote:
 > Hi Arild -
 >
 > Greetings from Australia...
 >
 > As I only have read access to the Raid List - I hope you do not mind my
 > replying to you directly.
 >
 > I understand that you are using a Abit NF7 motherboard with two Onboard
 > Silicon Image 3112 SATA ports (which makes it a NF7-S)  to which are
 > connected two  SATA drives - and you are having data corruption problems
 > with these particular drives.
 >
 > This is a known problem with this motherboard. If you connect only one
 > drive - no problem. However, connect two - and you have problems!
 >
 > I actually have one of these motherboards and to solve the problem I
 > disabled the Onboard SATA and installed a SATA controller in one of the
 > PCI slots. It was the quickest and easiest solution. I needed more than
 > two SATA ports anyway - so wasn't a problem. Just installed a 4 port 
card.
 >
 > If you search on the Internet there are a number of discussions on this
 > - examples - just a very quick search.
 > You may find a good fix - I took the easy way out!
 >
 > 
http://www.techspot.com/vb/all/windows/t-5278-SATA-RAID-data-corruption-problem-update.html 

 >
 > http://www.nforcershq.com/forum/image-vp65255.html
 > http://www.tomshardware.com/forum/103163-30-bios-update-problem
 >
 > Good Luck,  Tony
 >


Thanks you very much for all the help.

Best Regards,

Arild




Michael Evans wrote:
> One other thing besides what's already been mentioned.  You are seeing
> issues with Raid1 and -not- your Raid5 volume.  If you run mdadm -D
> /dev/md(whatever the number is) what version (for the superblock) is
> reported?  Preferably you will be using either superblock 1.1 or 1.2
> (preferably 1.1).  The only reason to use mdadm 1.0 or 0.9 are special
> cases, such as for /boot style volumes.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: need help: corrupt files on one of my raids
  2009-11-11  4:16               ` Michael Evans
  2009-11-11  8:06                 ` Arild Langseid
@ 2009-11-11  8:15                 ` Arild Langseid
  2009-11-11  8:25                   ` Leslie Rhorer
  2009-11-11  8:31                   ` Michael Evans
  1 sibling, 2 replies; 10+ messages in thread
From: Arild Langseid @ 2009-11-11  8:15 UTC (permalink / raw)
  To: Michael Evans; +Cc: LinuxRaid

Hi Michael:

I still ran your advice and got this result:

creator:~# mdadm  -D /dev/md1
/dev/md1:
         Version : 00.90

Is it a big problem running version 0.9? I use the version of the tools 
that comes with Debian - and as they are "some" conservative their 
versions lags some behind.

I have now upgraded my Debian Etch to Debian Lenny and have this mdadm:
creator:~# mdadm --version
mdadm - v2.6.7.2 - 14th November 2008

When I fix the firmware issues on my motherboard or buys a sepparate 
sata controller.... will the superblock be one of the version you 
suggests when I create the raid again? If not is it that large problem 
and I should upgrade my tools beyond what Debian provides?

Best Regards,
Arild




Michael Evans wrote:
> One other thing besides what's already been mentioned.  You are seeing
> issues with Raid1 and -not- your Raid5 volume.  If you run mdadm -D
> /dev/md(whatever the number is) what version (for the superblock) is
> reported?  Preferably you will be using either superblock 1.1 or 1.2
> (preferably 1.1).  The only reason to use mdadm 1.0 or 0.9 are special
> cases, such as for /boot style volumes.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: need help: corrupt files on one of my raids
  2009-11-11  8:15                 ` Arild Langseid
@ 2009-11-11  8:25                   ` Leslie Rhorer
  2009-11-11  8:31                   ` Michael Evans
  1 sibling, 0 replies; 10+ messages in thread
From: Leslie Rhorer @ 2009-11-11  8:25 UTC (permalink / raw)
  To: linux-raid

> creator:~# mdadm  -D /dev/md1
> /dev/md1:
>          Version : 00.90
> 
> Is it a big problem running version 0.9? I use the version of the tools
> that comes with Debian - and as they are "some" conservative their
> versions lags some behind.

	Debian is very conservative, but that's not the issue, here.

> I have now upgraded my Debian Etch to Debian Lenny and have this mdadm:
> creator:~# mdadm --version
> mdadm - v2.6.7.2 - 14th November 2008
> 
> When I fix the firmware issues on my motherboard or buys a sepparate
> sata controller.... will the superblock be one of the version you
> suggests when I create the raid again?

	It will, but not only with Debian.  All current releases of mdadm
still default to a 0.9 superblock.  Neil has been talking about changing
that, but as of now it is still true for all versions of mdadm.  I think it
was even true for mdadm 3.1, which was withdrawn.

> If not is it that large problem
> and I should upgrade my tools beyond what Debian provides?

	There's nothing truly horrible about a 0.9 superblock, unless your
RAID array is going to grow to be quite large.  The existing tools allow a
version 1.x superblock, but you must select the version yourself when you
create the array.  If you don't, it will default to 0.9.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: need help: corrupt files on one of my raids
  2009-11-11  8:15                 ` Arild Langseid
  2009-11-11  8:25                   ` Leslie Rhorer
@ 2009-11-11  8:31                   ` Michael Evans
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Evans @ 2009-11-11  8:31 UTC (permalink / raw)
  To: Arild Langseid; +Cc: LinuxRaid

Besides the man-page documented issues with 0.90 superblocks is the
fact that they are stored at the -end- of the partition.  This is a
good thing for boot-loaders like grub, which would like read-only
access to partitions to look at file-systems and read data.

However at the same time it is a -VERY- bad thing, because each member
of that raid1 array looks like it's own file-system.  Think about what
happens when one copy of that file-system is changed and the other(s)
is(are) not.

If you're careful, or don't care about the data in that partition (the
data in /boot is generally very -nice- to have, but the system can be
recovered and that data regenerated in one form or another), then
using the 0.9 format (or 1.0 for that matter) superblock is perfectly
fine.  Yet at the same time it's so easy to forget to force a device
rebuild the next time the array is assembled, and to remember to check
if your recovery CD/etc happened to start the array before mounting
things, or if it got mounted without being part of the raid-1 set.

This is why I mentioned using 0.9 / 1.0 only for special cases like
/boot.  1.1 and 1.2 are "better" because they are both at the front of
the partition and make it look nothing like a file-system that can be
mounted until assembled in to a raid array.

On Wed, Nov 11, 2009 at 12:15 AM, Arild Langseid <arild@langseid.no> wrote:
> Hi Michael:
>
> I still ran your advice and got this result:
>
> creator:~# mdadm  -D /dev/md1
> /dev/md1:
>        Version : 00.90
>
> Is it a big problem running version 0.9? I use the version of the tools that
> comes with Debian - and as they are "some" conservative their versions lags
> some behind.
>
> I have now upgraded my Debian Etch to Debian Lenny and have this mdadm:
> creator:~# mdadm --version
> mdadm - v2.6.7.2 - 14th November 2008
>
> When I fix the firmware issues on my motherboard or buys a sepparate sata
> controller.... will the superblock be one of the version you suggests when I
> create the raid again? If not is it that large problem and I should upgrade
> my tools beyond what Debian provides?
>
> Best Regards,
> Arild
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-11-11  8:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-10 14:07 need help: corrupt files on one of my raids Arild Langseid
2009-11-10 15:34 ` Majed B.
     [not found]   ` <4AF99159.4000800@langseid.no>
2009-11-10 18:26     ` Majed B.
     [not found]       ` <4AF9CB19.7000803@langseid.no>
2009-11-10 20:31         ` Majed B.
     [not found]           ` <4AF9D194.6000306@langseid.no>
2009-11-11  2:29             ` Majed B.
2009-11-11  4:16               ` Michael Evans
2009-11-11  8:06                 ` Arild Langseid
2009-11-11  8:15                 ` Arild Langseid
2009-11-11  8:25                   ` Leslie Rhorer
2009-11-11  8:31                   ` Michael Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).