MD drivers problems on 2.4.x

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* MD drivers problems on 2.4.x
@ 2003-05-23 13:35 Eric BENARD / Free
  2003-05-26  0:24 ` Juri Haberland
  0 siblings, 1 reply; 8+ messages in thread
From: Eric BENARD / Free @ 2003-05-23 13:35 UTC (permalink / raw)
  To: linux-raid

Hi,

I would like to know what is the md driver status on 2.4.19 and 2.4.20 kernels 
?
I ve got many problems with RAID 0 (/tmp), 1 (/, /usr/var) and 5 (/home) using 
these kernels and 4 IDE drives with ext3 partitions. In the end each time I 
lose all the data :-( 
After many tests (RAM is OK, HD don't have badblocks), I think the problems 
comes from the md layer.

May a switch to 2.4.21-rc3 solve my problems ?

Did anyone here already meet this problem ?

Many thanks and best regards

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: MD drivers problems on 2.4.x
  2003-05-23 13:35 MD drivers problems on 2.4.x Eric BENARD / Free
@ 2003-05-26  0:24 ` Juri Haberland
  2003-05-26  6:17   ` Eric BENARD / Free
  0 siblings, 1 reply; 8+ messages in thread
From: Juri Haberland @ 2003-05-26  0:24 UTC (permalink / raw)
  To: linux-raid

Eric BENARD / Free <ebenard@free.fr> wrote:
> Hi,
> 
> I would like to know what is the md driver status on 2.4.19 and 2.4.20 kernels 
> ?
> I ve got many problems with RAID 0 (/tmp), 1 (/, /usr/var) and 5 (/home) using 
> these kernels and 4 IDE drives with ext3 partitions. In the end each time I 
> lose all the data :-( 
> After many tests (RAM is OK, HD don't have badblocks), I think the problems 
> comes from the md layer.
> 
> May a switch to 2.4.21-rc3 solve my problems ?
> 
> Did anyone here already meet this problem ?
> 
> Many thanks and best regards

Sorry, no problems here. /, /home, swap and the other stuff all on RAID1
and humming along.
We need more data from you. What is failing in which way? Any error
messages, kernel oopses?

Cheers,
Juri

-- 
Juri Haberland  <juri@koschikode.com> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: MD drivers problems on 2.4.x
  2003-05-26  0:24 ` Juri Haberland
@ 2003-05-26  6:17   ` Eric BENARD / Free
  2003-05-26  7:09     ` Riley Williams
  2003-05-27  5:30     ` danci
  0 siblings, 2 replies; 8+ messages in thread
From: Eric BENARD / Free @ 2003-05-26  6:17 UTC (permalink / raw)
  To: Juri Haberland; +Cc: linux-raid

Le Lundi 26 Mai 2003 02:24, Juri Haberland a écrit :
> Eric BENARD / Free <ebenard@free.fr> wrote:
> > Hi,
> >
> > I would like to know what is the md driver status on 2.4.19 and 2.4.20
> > kernels ?
> > I ve got many problems with RAID 0 (/tmp), 1 (/, /usr/var) and 5 (/home)
> > using these kernels and 4 IDE drives with ext3 partitions. In the end
> > each time I lose all the data :-(
> > After many tests (RAM is OK, HD don't have badblocks), I think the
> > problems comes from the md layer.
> >
> > May a switch to 2.4.21-rc3 solve my problems ?
> >
> > Did anyone here already meet this problem ?
> >
> > Many thanks and best regards
>
> Sorry, no problems here. /, /home, swap and the other stuff all on RAID1
> and humming along.
> We need more data from you. What is failing in which way? Any error
> messages, kernel oopses?
>
in fact, I don't get to much error messages (or they were not reported to me 
by the user of this PC). 
I'm using a Promise PDC20277 IDE controler (with pseudo RAID functions but I'm 
not using them).
4 Western Digital 120 GB with 8MB of cache are connected (Master/slave with 
good cables) to it.
Power supply is sized large enough to support all the power consumption of the 
system.

I was using 2.4.20 with Alan Cox's patches in order to add new IDE layer for 
supporting this IDE controler.
Maybe the problem comes from IDE layer in these kernels with -ac patches ?

After high data trasnfers on the disks, file system corruption occurs.
Last time, a php script reported this error : failed: Input/output error (5).
The server was rebooted and never started again as filesystems were too much 
corrupted.

Today, I'm going to setup it again with 2.4.21-rc3 and I'm going to do several 
tests using Linux Test Project ( http://ltp.sourceforge.net/ ).
What are the tests you would advise me in order to stress as much as possible 
the filesystems ? 
I've noticed Cerberus could be a good test.

But if anyone already met this problem, I would be happy to hear any advice 
;-)

Many thanks and best regards
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: MD drivers problems on 2.4.x
  2003-05-26  6:17   ` Eric BENARD / Free
@ 2003-05-26  7:09     ` Riley Williams
  2003-05-27  5:30     ` danci
  1 sibling, 0 replies; 8+ messages in thread
From: Riley Williams @ 2003-05-26  7:09 UTC (permalink / raw)
  To: Eric BENARD / Free, Juri Haberland; +Cc: linux-raid

Hi Eric.

 >>> I would like to know what is the md driver status on 2.4.19
 >>> and 2.4.20 kernels ? I've got many problems with RAID 0 (/tmp),
 >>> 1 (/, /usr/var) and 5 (/home) using these kernels and 4 IDE
 >>> drives with ext3 partitions. In the end each time I lose all
 >>> the data :-(
 >>>
 >>> After many tests (RAM is OK, HD don't have bad blocks), I think
 >>> the problems comes from the md layer.
 >>>
 >>> May a switch to 2.4.21-rc3 solve my problems ?
 >>>
 >>> Did anyone here already meet this problem ?

 >> Sorry, no problems here. /, /home, swap and the other stuff all
 >> on RAID1 and humming along.
 >>
 >> We need more data from you. What is failing in which way? Any
 >> error messages, kernel oopses?

 > in fact, I don't get to much error messages (or they were not
 > reported to me by the user of this PC). I'm using a Promise
 > PDC20277 IDE controller (with pseudo RAID functions but I'm not
 > using them).

OK so far...

 > 4 Western Digital 120 GB with 8MB of cache are connected
 > (Master/slave with good cables) to it.

There was a thread on one of the Linux lists recently about Western
Digital drives that just plain vanished from the systems they were
installed in, and apparently several people were having the same
problem. Memory says it was 120 GB drives they were talking about,
but I could easily be wrong about that as I don't have the relevant
thread to hand.

You may wish to check that it isn't the actual drives that are at
fault here as the problem appeared to be specific to that one size
of drive, with the other sizes not experiencing the problem.

Failing that, there have been reports of problems with RAID setups
when drives are connected to both master and slave slots on the same
controller channel, with the recommendation that only one drive be
connected to each controller channel.

Best wishes from Riley.
---
 * Nothing as pretty as a smile, nothing as ugly as a frown.

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.483 / Virus Database: 279 - Release Date: 19-May-2003

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: MD drivers problems on 2.4.x
       [not found] <15F26D0D9E18E24583D912511D668FF8017F8C@exchange.ad.tlogica.com>
@ 2003-05-26  8:17 ` Riley Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Riley Williams @ 2003-05-26  8:17 UTC (permalink / raw)
  To: Michael Daskalov; +Cc: Linux-RAID

Hi Michael.

Unfortunately, I'm no RAID expert - I've never been in a position to
use RAID myself. As a result, I'm unable to help with your problem.

I've CC'd this to the Linux-RAID reflector where people who can help
you are likely to be found...

Best wishes from Riley.
---
 * Nothing as pretty as a smile, nothing as ugly as a frown.


 > -----Original Message-----
 > From: Michael Daskalov [mailto:MDaskalov@technologica.biz]
 > Sent: Monday, May 26, 2003 8:39 AM
 > To: Riley Williams
 > Subject: RE: MD drivers problems on 2.4.x
 >
 > Hi,
 >
 > I had similar problem with Kernel 2.4.18 from SuSE.
 > I have 4x120GB IDE disks from IBM/Hitachi.
 >
 > I've setup raid 5 on 4 disk (/dev/hda7, /dev/hdb7, /dev/hdc7,
 > /dev/hdd7). I've also setup raid1 (/dev/hda1, /dev/hdc1) and
 > some other raid1 md devices.
 >
 > The system was running fine for a week but suddenly /dev/hdd
 > gave me an error while reading one single block (on /dev/hdd5).
 >
 > I was trying to read it with  dd if=/dev/hdd7  and it failed
 > every time. I tried to read it with dd if=/dev/hdd and I
 > succeeded.
 >
 > I rebooted the computer with some Hitachi test diskette, and
 > tested the HDD in Full mode.
 >
 > It says there were no errors at all. I am not a S.M.A.R.T.
 > expert but I think if it was really hardware error, it would
 > be reported in SMART's error log.
 >
 > Then I thought, OK - the drive should be fine, I'll just
 > 'raidhotadd' it to the array and everything will be fine.
 > It almost suceeded. While reconstructing the array /dev/had
 > gave a similar error, but it was on another LBA sector.
 >
 > So, my whole raid5 was gone. Luckily it was still in test-only
 > mode.
 >
 > I switched to vanilla 2.4.20 which I self compiled, and it is
 > running fine for now.
 >
 > I just know I shouldn't count on this too much, but do I have
 > an alternative?!?
 >
 > Here are the error from /var/log/messages:
 >
 > May  6 00:15:16 devsrv kernel: hdd: dma_intr: status=0x51 {
 >		DriveReady SeekComplete Error }
 > May  6 00:15:16 devsrv kernel: hdd: dma_intr: error=0x40 {
 >		UncorrectableError }, LBAsect=24145681, high=1,
 >		low=7368465, sector=32048
 > May  6 00:15:16 devsrv kernel: end_request: I/O error, dev 16:47
 >		(hdd), sector 32048
 > May  6 00:15:16 devsrv kernel: raid5: Disk failure on hdd7,
 >		disabling device. Operation continuing on 3 devices
 > May  6 00:15:16 devsrv kernel: md: updating md2 RAID superblock
 >		on device
 > May  6 00:15:16 devsrv kernel: md: (skipping faulty hdd7 )
 >
 > .....
 >
 > May  7 14:42:58 devsrv kernel: hda: dma_intr: error=0x40 {
 >		UncorrectableError }, LBAsect=38454698, high=2,
 >		low=4900266, sector=14341032
 > May  7 14:42:58 devsrv kernel: end_request: I/O error, dev 03:07
 >		(hda), sector 14341032
 > May  7 14:42:58 devsrv kernel: raid5: Disk failure on hda7,
 >		disabling device. Operation continuing on 2 devices
 > May  7 14:42:58 devsrv kernel: md: updating md2 RAID superblock
 >		on device
 > May  7 14:42:58 devsrv kernel: md: hdb7 [events:
 >		00000046]<6>(write) hdb7's sb offset: 30724160
 > May  7 14:42:58 devsrv kernel: md: recovery thread got woken up ...
 > May  7 14:42:58 devsrv kernel: md2: no spare disk to reconstruct
 >		array! -- continuing in degraded mode
 >
 > .... And then
 >
 > May  7 14:42:58 devsrv kernel: md: recovery thread finished ...
 > May  7 14:42:58 devsrv kernel: md: hdc7 [events:
 >		00000046]<6>(write) hdc7's sb offset: 30724160
 > May  7 14:42:58 devsrv kernel: md: (skipping faulty hda7 )
 > May  7 14:43:02 devsrv kernel: vs-13070: reiserfs_read_inode2:
 >		i/o failure occurred trying to find stat data of [144691
 >		148158 0x0 SD]
 > May  7 14:43:02 devsrv kernel: vs-13070: reiserfs_read_inode2:
 >		i/o failure occurred trying to find stat data of [144691
 >		148158 0x0 SD]
 > May  7 14:43:02 devsrv kernel: vs-13070: reiserfs_read_inode2:
 >		i/o failure occurred trying to find stat data of [144691
 >		208720 0x0 SD]
 > May  7 14:43:02 devsrv kernel: vs-13070: reiserfs_read_inode2:
 >		i/o failure occurred trying to find stat data of [144691
 >		208720 0x0 SD]
 >
 > Here is some very ugly bug IMHO. See:
 >
 > May  7 14:42:58 devsrv kernel: raid5: Disk failure on hda7,
 >		disabling device. Operation continuing on 2 devices
 > 
 > How can a raid5 array built on 4 devices to operate with 2 devices ?
 > It should rather stop and do nothing ....
 >
 > The after reboot I did
 > mkraid --dangerous-no-resync --force /dev/md2
 > I was able to mount the reiserfs 3.6 format. I was able to read
 > some data from some files, But it was very heavily corrupted.
 > The Oracle installation that was there was useless (sqlplus gave
 > Segmentation faults), and so on.

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.483 / Virus Database: 279 - Release Date: 19-May-2003


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: MD drivers problems on 2.4.x
  2003-05-26  6:17   ` Eric BENARD / Free
  2003-05-26  7:09     ` Riley Williams
@ 2003-05-27  5:30     ` danci
  2003-05-27  6:20       ` Eric BENARD / Free
  2003-05-28 11:33       ` Eric BENARD / Free
  1 sibling, 2 replies; 8+ messages in thread
From: danci @ 2003-05-27  5:30 UTC (permalink / raw)
  To: Eric BENARD / Free; +Cc: linux-raid

On Mon, 26 May 2003, Eric BENARD / Free wrote:

> in fact, I don't get to much error messages (or they were not reported to me
> by the user of this PC).
> I'm using a Promise PDC20277 IDE controler (with pseudo RAID functions but I'm
> not using them).
> 4 Western Digital 120 GB with 8MB of cache are connected (Master/slave with
> good cables) to it.

There is your problem - Western Digital. The firmware on some of WD drives
is very BAD and WD offers an upgrade. Go check out their web site...

  D.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: MD drivers problems on 2.4.x
  2003-05-27  5:30     ` danci
@ 2003-05-27  6:20       ` Eric BENARD / Free
  2003-05-28 11:33       ` Eric BENARD / Free
  1 sibling, 0 replies; 8+ messages in thread
From: Eric BENARD / Free @ 2003-05-27  6:20 UTC (permalink / raw)
  To: danci; +Cc: linux-raid

Le Mardi 27 Mai 2003 07:30, danci@agenda.si a écrit :
> On Mon, 26 May 2003, Eric BENARD / Free wrote:
> > in fact, I don't get to much error messages (or they were not reported to
> > me by the user of this PC).
> > I'm using a Promise PDC20277 IDE controler (with pseudo RAID functions
> > but I'm not using them).
> > 4 Western Digital 120 GB with 8MB of cache are connected (Master/slave
> > with good cables) to it.
>
> There is your problem - Western Digital. The firmware on some of WD drives
> is very BAD and WD offers an upgrade. Go check out their web site...
>
Thanks ! Unfortunatly, my drivers doesn't seems to be concerned by this 
problem (in fact my drives only have 4MB of cache).

Best regards
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: MD drivers problems on 2.4.x
  2003-05-27  5:30     ` danci
  2003-05-27  6:20       ` Eric BENARD / Free
@ 2003-05-28 11:33       ` Eric BENARD / Free
  1 sibling, 0 replies; 8+ messages in thread
From: Eric BENARD / Free @ 2003-05-28 11:33 UTC (permalink / raw)
  To: danci; +Cc: linux-raid

Le Mardi 27 Mai 2003 07:30, danci@agenda.si a écrit :
> There is your problem - Western Digital. The firmware on some of WD drives
> is very BAD and WD offers an upgrade. Go check out their web site...
>
in fact, maybe you were right : I upgraded the firmware of the 4 disks (even 
if Western says my drives were not affected by the problem), I upgraded to 
kernel 2.4.21-rc4 and ran heavy tests on the system without any crash (400 GB 
transfered between the RAID filseystems, 400 GB transfered from network at 
the same time, cpubrun running ...)

Many thanks !
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-05-28 11:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-23 13:35 MD drivers problems on 2.4.x Eric BENARD / Free
2003-05-26  0:24 ` Juri Haberland
2003-05-26  6:17   ` Eric BENARD / Free
2003-05-26  7:09     ` Riley Williams
2003-05-27  5:30     ` danci
2003-05-27  6:20       ` Eric BENARD / Free
2003-05-28 11:33       ` Eric BENARD / Free
     [not found] <15F26D0D9E18E24583D912511D668FF8017F8C@exchange.ad.tlogica.com>
2003-05-26  8:17 ` Riley Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).