new RAID-1 Array always shows filesystem errors...

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* new RAID-1 Array always shows filesystem errors...
@ 2004-02-21 12:00 Ralph Paßgang
  2004-02-24  2:06 ` Ralph Paßgang
  0 siblings, 1 reply; 2+ messages in thread
From: Ralph Paßgang @ 2004-02-21 12:00 UTC (permalink / raw)
  To: linux-raid

Hi,

in the past I used a raid1 with two 123,5GB drives for about 1,5 years now and 
never expierenced problems with the raid part... A month ago one drive fails 
(IRQ timeouts all the time) so I was thinking of a new system.   

I decided to start from the beginning, this time with: 2x RAID-1 (so 4 disks 
all together). I wanted a lvm (one big vg) over the two raid-arrays...

So I created the 2 NEW raid arrays, but I started in one of the two arrays 
with only one disk  and the second mirror disk was missing (because I had the 
old raid device (degraded due to the disk failure) with my old data which I 
wanted to copy to the new lvm and then use the old disk to complete the new 
raid array).

At first everything went fine, but later on I noticed massive filesystem 
errors, so I deleted the new lvm and the 2 new raid1 arrays. I was happy that 
I still had my old raid1 array with the old data, so no data loss happend...

Then I created the 2 raid arrays again but this time without the lvm, because 
I though the lvm part was the problem... I started again in the degraded mode 
with only one disk out of two in each array.

This time the data seems to be ok, so I completed the frist new array with 
puting the seconds disk in. The sync process was ok but then I noticed 
problems again. If I want to read data I get massive filesystem errors... I 
was suprised and removed the mirror disk (the one I just hotadded). Now the 
data seems to be fine again, no data corruption, no filesystem errors...

I tried the same on the second raid-array and notice the same strange 
errors... In the degraded one-disk mode the raid is ok, but if the second 
disk is added and resynced than I got these problems.

On the second raid are some mp3, so I played some of them (to see if that are 
actually errors or bogus errors) and I noticed that the song skips every 2-3 
seconds for 1-2 seconds. One could think that the read access to disc 1 is 
okay, but the half time disk2 should deliver the data and it doesn't... in 
the log the filesystem errors are reported when i am playing mp3s...

Even some Files or directories aren't accessible anymore. (Even if they would, 
the data would be useless). If i "ls" in a directory inside the raid1 devices 
(with two synced disks) it says for example for about half the files: 
"perrmission denied"  or "file not found" (on a ls *) (i am root!)

Strage is, that the old raid is/was fine, I just can't build a new raid1, 
because after the disk sync I got these errors...

Here a example filesystem errors I see (thousands in my log):
Feb 21 01:27:52 services kernel: vs-5150: search_by_key: invalid format found 
in block 3783953. Fsck?
Feb 21 01:27:52 services kernel: is_tree_node: node level 0 does not match to 
the expected one 1
Feb 21 01:27:52 services kernel: vs-5150: search_by_key: invalid format found 
in block 3783953. Fsck?
...

(It can't be a filesystem error, because as soon as I remove a drive the 
errors are gone)

My Setup is this:
DEVICE  /dev/hde /dev/hdf /dev/hdb /dev/hdg

ARRAY   /dev/md0 devices=/dev/hde,/dev/hdb
ARRAY   /dev/md1 devices=/dev/hdf,missing

(that are only the two new arrays, the old one is deleted now!!!... /dev/md1 
is working in this setup... md0 gives the error (because both disks are in))

I use: Debian Unstable with: linux-2.6.1 (vanilla +pnpbios patch), mdadm 
1.4.0-3 (bugs.debian.org says there is no such bug in mdadm) and a reiserfs 
3.6 partition on the md-devices.

My hdd Setup is this:
    ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:pio
    ide2: BM-DMA at 0xdc00-0xdc07, BIOS settings: hde:pio, hdf:pio
    ide3: BM-DMA at 0xdc08-0xdc0f, BIOS settings: hdg:pio, hdh:pio
hda: IC35L060AVV207-0, ATA DISK drive
hdb: IC35L120AVVA07-0, ATA DISK drive
hde: IC35L120AVV207-1, ATA DISK drive
hdf: IC35L120AVV207-1, ATA DISK drive
hdg: IC35L120AVV207-0, ATA DISK drive

hda: 120103200 sectors (61492 MB) w/1821KiB Cache, CHS=16383/255/63, UDMA(100)
 hda: hda1 hda2
hdb: max request size: 128KiB
hdb: 241254720 sectors (123522 MB) w/1863KiB Cache, CHS=65535/15/63, UDMA(100)
 hdb: unknown partition table
hde: max request size: 1024KiB
hde: 241254720 sectors (123522 MB) w/7965KiB Cache, CHS=16383/255/63, 
UDMA(100)
 hde: unknown partition table
hdf: max request size: 1024KiB
hdf: 241254720 sectors (123522 MB) w/7965KiB Cache, CHS=16383/255/63, 
UDMA(100)
 hdf: unknown partition table
hdg: max request size: 1024KiB
hdg: 241254720 sectors (123522 MB) w/1821KiB Cache, CHS=16383/255/63, 
UDMA(100)
 hdg: unknown partition table

Personalities : [raid1]
md1 : active raid1 hdf[0]
      120627264 blocks [2/2] [UU]

md0 : active raid1 hde[0]
      120627264 blocks [2/1] [U_]

unused devices: <none>

the raid (/mdadm) doesn't seem to detect this error... I am quite sure that is 
nothing with the drive or the pc... I use 4 drives with 2 raids, so it is not 
a "one drive is broken" thing.

Please help me, because the data on the two arrays is quite important to me, 
so that running two normal disks which holds the data is no solution for me. 
I had a drive failure once and the raid saved me... I need the two new raid 
arrays... Maybe a kernel thing? 2.6.1 is not the latest... should I upgrade 
to 2.6.3? Could it be mdadm? Should I recompile it against my kernel?

Thanks for your help... and sorry for my bad english :)

best regard,
 Ralph

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: new RAID-1 Array always shows filesystem errors...
  2004-02-21 12:00 new RAID-1 Array always shows filesystem errors Ralph Paßgang
@ 2004-02-24  2:06 ` Ralph Paßgang
  0 siblings, 0 replies; 2+ messages in thread
From: Ralph Paßgang @ 2004-02-24  2:06 UTC (permalink / raw)
  To: linux-raid

Hi,

I have made some more investigation in the problem I described and I think it 
has something to do with changes in the raid code betweed 2.4.25 and 2.6.1 
(that's the versions I used at last).

Here in short the problem I noticed:

An old raid-1 array (2/2 disks) worked with with 2.6.1, but if I wanted to 
create a new one that I had problems.

I always started in degraded mode (1/2 drives), because the second hard drive 
for the raid-1 array was in the moment in use. Later I added the missing 
drive and after the sync procress I can't access a lot of files and the files 
I could access were damages (seems to miss about 50% or at least 50% was crap 
content). After remove the second disk again the data was okay again. dmesg / 
syslog showed reiserfs/fs errors (see my first mail)

Now I downgraded to 2.4.25 again and I still had the raid-1 array with the 
missing hard drive in degraded mode, which I set up under the 2.6.1 kernel. 
So I added now the missing drive again.

After syncing the content is still okay, now broken files, no data 
corruption... After even two reboots. The only difference between distaster 
and a perfect system seems to be my kernel-version:

Here the facts of my system once again:

AMD Duron 900, 1x60GB (Linux System itself),4x120GB (2xRAID1), VIA IDE 
Controller + Promise Ultra 133 Controller.

Debian Unstable with GCC 3.3,2, libc6 2.3.2
Vanilla Kernel 2.4.25 / Vanilla Kernel 2.6.1 (plus pnpbios patch)
mdadm 1.4.0
raidtools 0.42

/proc/mdstat example for my setup:
read_ahead 1024 sectors
md1 : active raid1 hdf[0] hdg[1]
      120627264 blocks [2/2] [UU]

md0 : active raid1 hde[0]
      120627264 blocks [2/1] [U_]

Because my old raid array worked under 2.6.1 without a problem with both disks 
(never made a resync under the 2.6) I think it has something to do with the 
sync... I use to hotadd a harddrive:

mdadm /dev/md0 -a /dev/hdf (for example)

Is the sync process a kernel oder a userland thing? If it is userland maybe it 
i also a debian bug, if so, please notice me, than I will put it in the 
debian bug tracking system.

thanks,

--Ralph

Am Samstag, 21. Februar 2004 13:00 schrieben Sie:
> Hi,
>
> in the past I used a raid1 with two 123,5GB drives for about 1,5 years now
> and never expierenced problems with the raid part... A month ago one drive
> fails (IRQ timeouts all the time) so I was thinking of a new system.
>
> I decided to start from the beginning, this time with: 2x RAID-1 (so 4
> disks all together). I wanted a lvm (one big vg) over the two
> raid-arrays...
>
> So I created the 2 NEW raid arrays, but I started in one of the two arrays
> with only one disk  and the second mirror disk was missing (because I had
> the old raid device (degraded due to the disk failure) with my old data
> which I wanted to copy to the new lvm and then use the old disk to complete
> the new raid array).
>
> At first everything went fine, but later on I noticed massive filesystem
> errors, so I deleted the new lvm and the 2 new raid1 arrays. I was happy
> that I still had my old raid1 array with the old data, so no data loss
> happend...
>
> Then I created the 2 raid arrays again but this time without the lvm,
> because I though the lvm part was the problem... I started again in the
> degraded mode with only one disk out of two in each array.
>
> This time the data seems to be ok, so I completed the frist new array with
> puting the seconds disk in. The sync process was ok but then I noticed
> problems again. If I want to read data I get massive filesystem errors... I
> was suprised and removed the mirror disk (the one I just hotadded). Now the
> data seems to be fine again, no data corruption, no filesystem errors...
>
> I tried the same on the second raid-array and notice the same strange
> errors... In the degraded one-disk mode the raid is ok, but if the second
> disk is added and resynced than I got these problems.
>
> On the second raid are some mp3, so I played some of them (to see if that
> are actually errors or bogus errors) and I noticed that the song skips
> every 2-3 seconds for 1-2 seconds. One could think that the read access to
> disc 1 is okay, but the half time disk2 should deliver the data and it
> doesn't... in the log the filesystem errors are reported when i am playing
> mp3s...
>
> Even some Files or directories aren't accessible anymore. (Even if they
> would, the data would be useless). If i "ls" in a directory inside the
> raid1 devices (with two synced disks) it says for example for about half
> the files: "perrmission denied"  or "file not found" (on a ls *) (i am
> root!)
>
> Strage is, that the old raid is/was fine, I just can't build a new raid1,
> because after the disk sync I got these errors...
>
> Here a example filesystem errors I see (thousands in my log):
> Feb 21 01:27:52 services kernel: vs-5150: search_by_key: invalid format
> found in block 3783953. Fsck?
> Feb 21 01:27:52 services kernel: is_tree_node: node level 0 does not match
> to the expected one 1
> Feb 21 01:27:52 services kernel: vs-5150: search_by_key: invalid format
> found in block 3783953. Fsck?
> ...
>
> (It can't be a filesystem error, because as soon as I remove a drive the
> errors are gone)
>
> My Setup is this:
> DEVICE  /dev/hde /dev/hdf /dev/hdb /dev/hdg
>
> ARRAY   /dev/md0 devices=/dev/hde,/dev/hdb
> ARRAY   /dev/md1 devices=/dev/hdf,missing
>
> (that are only the two new arrays, the old one is deleted now!!!...
> /dev/md1 is working in this setup... md0 gives the error (because both
> disks are in))
>
> I use: Debian Unstable with: linux-2.6.1 (vanilla +pnpbios patch), mdadm
> 1.4.0-3 (bugs.debian.org says there is no such bug in mdadm) and a reiserfs
> 3.6 partition on the md-devices.
>
> My hdd Setup is this:
>     ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
>     ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:pio
>     ide2: BM-DMA at 0xdc00-0xdc07, BIOS settings: hde:pio, hdf:pio
>     ide3: BM-DMA at 0xdc08-0xdc0f, BIOS settings: hdg:pio, hdh:pio
> hda: IC35L060AVV207-0, ATA DISK drive
> hdb: IC35L120AVVA07-0, ATA DISK drive
> hde: IC35L120AVV207-1, ATA DISK drive
> hdf: IC35L120AVV207-1, ATA DISK drive
> hdg: IC35L120AVV207-0, ATA DISK drive
>
> hda: 120103200 sectors (61492 MB) w/1821KiB Cache, CHS=16383/255/63,
> UDMA(100) hda: hda1 hda2
> hdb: max request size: 128KiB
> hdb: 241254720 sectors (123522 MB) w/1863KiB Cache, CHS=65535/15/63,
> UDMA(100) hdb: unknown partition table
> hde: max request size: 1024KiB
> hde: 241254720 sectors (123522 MB) w/7965KiB Cache, CHS=16383/255/63,
> UDMA(100)
>  hde: unknown partition table
> hdf: max request size: 1024KiB
> hdf: 241254720 sectors (123522 MB) w/7965KiB Cache, CHS=16383/255/63,
> UDMA(100)
>  hdf: unknown partition table
> hdg: max request size: 1024KiB
> hdg: 241254720 sectors (123522 MB) w/1821KiB Cache, CHS=16383/255/63,
> UDMA(100)
>  hdg: unknown partition table
>
> Personalities : [raid1]
> md1 : active raid1 hdf[0]
>       120627264 blocks [2/2] [UU]
>
> md0 : active raid1 hde[0]
>       120627264 blocks [2/1] [U_]
>
> unused devices: <none>
>
> the raid (/mdadm) doesn't seem to detect this error... I am quite sure that
> is nothing with the drive or the pc... I use 4 drives with 2 raids, so it
> is not a "one drive is broken" thing.
>
> Please help me, because the data on the two arrays is quite important to
> me, so that running two normal disks which holds the data is no solution
> for me. I had a drive failure once and the raid saved me... I need the two
> new raid arrays... Maybe a kernel thing? 2.6.1 is not the latest... should
> I upgrade to 2.6.3? Could it be mdadm? Should I recompile it against my
> kernel?
>
> Thanks for your help... and sorry for my bad english :)
>
> best regard,
>  Ralph
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-02-24  2:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-21 12:00 new RAID-1 Array always shows filesystem errors Ralph Paßgang
2004-02-24  2:06 ` Ralph Paßgang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).