Need to upgrade to latest stable mdadm version?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Need to upgrade to latest stable mdadm version?
@ 2006-01-22 17:41 Mitchell Laks
  2006-01-22 17:49 ` David Greaves
  2006-01-22 18:20 ` Gordon Henderson
  0 siblings, 2 replies; 8+ messages in thread
From: Mitchell Laks @ 2006-01-22 17:41 UTC (permalink / raw)
  To: linux-raid

Hi,

I am running serveral Debian Sarge servers with crucial data and need to 
update (gulp) to the latest stable Linux kernel 2.6.15.1. :( I am terrified.

I would like to match the latest kernel to the latest stable mdadm. What can I 
do to make that match?  I thought that perhaps I should use mdadm 2.2: 
however I have been lurking reading the emails on this list and I see scary 
messages like

1) mdadm-2.2 SEGFAULT: mdadm --assemble --scan from 12-22 (see patches that 
are sent in)
2) mdadm-2.2 typo dated 1-12-06

How can I get the most current patched version?

Why am I doing this? I am running a few Debian Sarge servers. I would prefer 
to run everything from Sarge (including the kernel...).  However I will be 
using SATA controllers in my latest installs, and having  tried out a few, I 
discovered that the standard 2.6.8-2 debian Sarge kernel (nor even the 2.6.12 
testing) will not work with the SATA controllers I have (2.6.15 in sid does). 

I have a promise SATAII 150 TX4 sata controller card as well as a on board 
via VT8237 sata controller on my Asus A8v motherboard. I found that 
with the debian Sarge kernel 2.6.8-2: the sata_promise promise module is not 
working, 
and with the ETCH testing 2.6.12:  the sata_via module fails with

mdadm -Cv /dev/md0 -n2 -l1 /dev/sda1 /dev/sdb1
mkfs.ext3 /dev/md0

causing reproducible kernel error messages

ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x84 { DriveStatusError BadCRC }
ata1: command 0x35 timeout, stat 0xd0 host_stat 0x0
ata1: status=0xd0 { Busy }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key: Aborted Command
    Additional sense: Scsi parity error
end_request: I/O error, dev sda, sector 94109775
raid1: Disk failure on sda1, disabling device.
        Operation continuing on 1 devices
ATA: abnormal status 0xD0 on port 0xC007
ATA: abnormal status 0xD0 on port 0xC007
ATA: abnormal status 0xD0 on port 0xC007
ata1: command 0x35 timeout, stat 0xd0 host_stat 0x1
ata1: status=0xd0 { Busy }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key: Aborted Command
    Additional sense: Scsi parity error
end_request: I/O error, dev sda, sector 94109783

Note these messages are not caused by the hard drive (reproduced on multiple 
hard drives which fine work with the Promise controller), nor is it the 
controller hardware (occurs with multiple asus A8v motherboards) - because it 
only happens when both the sata_via and sata_promise controller are loaded by 
the ETCH debian 2.6.12 kernel. If only sata_via is loaded, system works fine. 
Moreover it does not happen with the debian SID 2.6.15 kernel :).

So I have to compile my own 2.6.15 kernel. So what version of mdadm do I use? 
How shall I install it? 

Does it make sense to simply compile mdadm  2.2 and replace /sbin/mdadm with 
the new version?????? How can I get the best recent mdadm? I am using raid1.

Thank you all for all your help in the past!
Mitchell Laks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need to upgrade to latest stable mdadm version?
  2006-01-22 17:41 Need to upgrade to latest stable mdadm version? Mitchell Laks
@ 2006-01-22 17:49 ` David Greaves
  2006-01-22 22:31   ` Mark Hahn
  2006-01-23 14:02   ` Need to upgrade to latest stable mdadm version? Mitchell Laks
  2006-01-22 18:20 ` Gordon Henderson
  1 sibling, 2 replies; 8+ messages in thread
From: David Greaves @ 2006-01-22 17:49 UTC (permalink / raw)
  To: Mitchell Laks; +Cc: linux-raid

Mitchell Laks wrote:

><snip>
>I have a promise SATAII 150 TX4 sata controller card as well as a on board 
>via VT8237 sata controller on my Asus A8v motherboard. I found that 
>with the debian Sarge kernel 2.6.8-2: the sata_promise promise module is not 
>working, 
>and with the ETCH testing 2.6.12:  the sata_via module fails with
>
>mdadm -Cv /dev/md0 -n2 -l1 /dev/sda1 /dev/sdb1
>mkfs.ext3 /dev/md0
>
>causing reproducible kernel error messages
>
>ata1: status=0x51 { DriveReady SeekComplete Error }
>ata1: error=0x84 { DriveStatusError BadCRC }
>ata1: command 0x35 timeout, stat 0xd0 host_stat 0x0
>ata1: status=0xd0 { Busy }
>SCSI error : <0 0 0 0> return code = 0x8000002
>sda: Current: sense key: Aborted Command
>    Additional sense: Scsi parity error
>end_request: I/O error, dev sda, sector 94109775
>raid1: Disk failure on sda1, disabling device.
>        Operation continuing on 1 devices
>ATA: abnormal status 0xD0 on port 0xC007
>ATA: abnormal status 0xD0 on port 0xC007
>ATA: abnormal status 0xD0 on port 0xC007
>ata1: command 0x35 timeout, stat 0xd0 host_stat 0x1
>ata1: status=0xd0 { Busy }
>SCSI error : <0 0 0 0> return code = 0x8000002
>sda: Current: sense key: Aborted Command
>    Additional sense: Scsi parity error
>end_request: I/O error, dev sda, sector 94109783
>
>Note these messages are not caused by the hard drive (reproduced on multiple 
>hard drives which fine work with the Promise controller), nor is it the 
>controller hardware (occurs with multiple asus A8v motherboards) - because it 
>only happens when both the sata_via and sata_promise controller are loaded by 
>the ETCH debian 2.6.12 kernel. If only sata_via is loaded, system works fine. 
>Moreover it does not happen with the debian SID 2.6.15 kernel :).
>
>So I have to compile my own 2.6.15 kernel. So what version of mdadm do I use? 
>How shall I install it? 
>
>Does it make sense to simply compile mdadm  2.2 and replace /sbin/mdadm with 
>the new version?????? How can I get the best recent mdadm? I am using raid1.
>
>Thank you all for all your help in the past!
>Mitchell Laks
>  
>

Just FYI
I am running the *stock* 2.6.15 and get the same problems (ata timeouts etc)

I recently wrote to the lkml and ide lists following up an old post.
No replies yet.

I run sata_via and sata_sil
(I've seen just a few people with these problems - various kernels -
all, to my recollection, seem to have two sets of sata controller
chips.... I wonder....)

I note you say the *sid* kernel doesn't have these problems so I'll try
that. I just wanted to mention that the problem may exist on stock
kernels 'cos you talk about rolling your own...

David



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need to upgrade to latest stable mdadm version?
  2006-01-22 17:41 Need to upgrade to latest stable mdadm version? Mitchell Laks
  2006-01-22 17:49 ` David Greaves
@ 2006-01-22 18:20 ` Gordon Henderson
  1 sibling, 0 replies; 8+ messages in thread
From: Gordon Henderson @ 2006-01-22 18:20 UTC (permalink / raw)
  To: Mitchell Laks; +Cc: linux-raid

On Sun, 22 Jan 2006, Mitchell Laks wrote:

> So I have to compile my own 2.6.15 kernel. So what version of mdadm do I use?
> How shall I install it?

I'm using a stock (www.kernel.org) 2.6.15 kernel on several Debian Sarge
servers and just using the debian packaged mdadm. Seems to work OK.

> Does it make sense to simply compile mdadm  2.2 and replace /sbin/mdadm with
> the new version?????? How can I get the best recent mdadm? I am using raid1.

Personally, I'd say don't upgrade unless you actually need the features of
the latest & gratest....

Debian sarge ships with 1.9.0:

  eagle:~# uname -a
  Linux eagle 2.6.15 #1 PREEMPT Sat Jan 21 15:59:40 GMT 2006 i686 GNU/Linux
  eagle:~# mdadm --version
  mdadm - v1.9.0 - 04 February 2005
  eagle:~# dpkg -l | fgrep mdadm
  ii  mdadm          1.9.0-4sarge1  Manage MD devices aka Linux Software Raid

Gordon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need to upgrade to latest stable mdadm version?
  2006-01-22 17:49 ` David Greaves
@ 2006-01-22 22:31   ` Mark Hahn
  2006-01-23 12:31     ` Possible libata/sata/Asus problem (was Re: Need to upgrade to latest stable mdadm version?) David Greaves
  2006-01-23 14:02   ` Need to upgrade to latest stable mdadm version? Mitchell Laks
  1 sibling, 1 reply; 8+ messages in thread
From: Mark Hahn @ 2006-01-22 22:31 UTC (permalink / raw)
  To: David Greaves; +Cc: Mitchell Laks, linux-raid

> >and with the ETCH testing 2.6.12:  the sata_via module fails with

I'm sure you know that no kernel developer really cares about distro-hacked 
kernels.  why not test a real (kernel.org) kernel?

> >ata1: status=0x51 { DriveReady SeekComplete Error }
> >ata1: error=0x84 { DriveStatusError BadCRC }

badcrc's are a sign that the link is failing - bad cable, bad power,
overclocking, possibly an error in the driver's timing config.
it cannot possibly be an mdadm problem, and cannot be related to 
other software (kernel memory management, say.)

> >ata1: command 0x35 timeout, stat 0xd0 host_stat 0x0
> >ata1: status=0xd0 { Busy }
> >SCSI error : <0 0 0 0> return code = 0x8000002
> >sda: Current: sense key: Aborted Command
> >    Additional sense: Scsi parity error
> >end_request: I/O error, dev sda, sector 94109775
> >raid1: Disk failure on sda1, disabling device.

I don't know the code well enough to tell whether that CRC error is
triggering that cascade of scsi->raid errors.

> >Note these messages are not caused by the hard drive (reproduced on multiple 
> >hard drives which fine work with the Promise controller), nor is it the 
> >controller hardware (occurs with multiple asus A8v motherboards) - because it 

being triggered by a CRC error means that it's definitely a problem in the 
cable, though it may simply that something has mis-programmed the timing
of the port.  working with a different controller also doesn't rule out 
power problems.

> >only happens when both the sata_via and sata_promise controller are loaded by 
> >the ETCH debian 2.6.12 kernel.

I don't see how driver interaction could cause the BadCRC, unless one driver
is screwing with the timing registers of the other's hardware.

> >So I have to compile my own 2.6.15 kernel. So what version of mdadm do I use? 
> >How shall I install it? 

why do you think mdadm has anything to do with it?  it's a user-level tool
for manipilating md.  it only knows blockdevs, not drivers, CRC's, etc.

> >Does it make sense to simply compile mdadm  2.2 and replace /sbin/mdadm with 
> >the new version?????? How can I get the best recent mdadm? I am using raid1.

mdadm cannot possibly have anything to do with causing BadCRC's.  upgrade if
you feel like it, but not because of this problem.

> I am running the *stock* 2.6.15 and get the same problems (ata timeouts etc)

is there a reason you call this a timeout, rather than a BadCRC?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Possible libata/sata/Asus problem (was Re: Need to upgrade to latest stable mdadm version?)
  2006-01-22 22:31   ` Mark Hahn
@ 2006-01-23 12:31     ` David Greaves
  2006-01-23 17:05       ` (unknown), Shawn Usry
  0 siblings, 1 reply; 8+ messages in thread
From: David Greaves @ 2006-01-23 12:31 UTC (permalink / raw)
  To: Mark Hahn; +Cc: Mitchell Laks, linux-raid, linux-kernel, IDE Linux

Mark Hahn wrote:

>>>and with the ETCH testing 2.6.12:  the sata_via module fails with
>>>      
>>>
>I'm sure you know that no kernel developer really cares about distro-hacked 
>kernels.  why not test a real (kernel.org) kernel?
>  
>
Only because if the problem exists on the stock kernel and not on the
distro kernel then there could be assistance in determining which patch
solves (or hides!) the problem. This may or may not actually be helpful.


>>>ata1: status=0x51 { DriveReady SeekComplete Error }
>>>ata1: error=0x84 { DriveStatusError BadCRC }
>>>      
>>>
>badcrc's are a sign that the link is failing - bad cable, bad power,
>overclocking,
>
OK

>possibly an error in the driver's timing config.
>  
>
A-ha!

>it cannot possibly be an mdadm problem, and cannot be related to 
>other software (kernel memory management, say.)
>  
>
Agreed.

> I don't see how driver interaction could cause the BadCRC, unless one
> driver
>
>is screwing with the timing registers of the other's hardware.
>  
>
And maybe, on a lightly loaded system, RAID causes concurrent access
(and potentially triggers problems) more often than a non-RAID solution?

> mdadm cannot possibly have anything to do with causing BadCRC's.
> upgrade if
>
>you feel like it, but not because of this problem.
>  
>
Completely agree.

>>I am running the *stock* 2.6.15 and get the same problems (ata timeouts etc)
>>    
>>
>is there a reason you call this a timeout, rather than a BadCRC?
>  
>
I had:
  ata2: command 0x25 timeout, stat 0x51 host_stat 0x0

It stuck in my head. It's not that representative. My bad.

most errors (for me) were:

Jan 19 15:23:05 haze kernel: ata1: PIO error
Jan 19 15:23:05 haze kernel: ata1: status=0x50 { DriveReady SeekComplete }
Jan 19 15:23:05 haze kernel: ata1: PIO error
Jan 19 15:23:05 haze kernel: ata1: status=0x50 { DriveReady SeekComplete }
Jan 19 15:23:05 haze kernel: ata1: PIO error


But if you look at:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113769509617034&w=2

you'll see that I had what looked like a 'spurious' bad-block error -
and I have an Asus motherboard, am using the via_sata driver, have
another sata driver loaded and am using md - all under 2.6.15 .... hence
the tentative association of the problems :)

Oh, and libata's error handling is embryonic - maybe it should be
retrying. I dunno.

I've seen other potentially related problems in the
sata/motherboard/raid area.
Personally I suspect buggy Asus motherboards.
I wonder if the bug is triggered by multiple drivers or some concurrency
- hence raid's involvment...
(Since I suspect md is actually tickling it, not causing it moving to
lkml and linux-ide too)

Of course I plan to do some tests - but mentioning it may give others
ideas too... And maybe I/we'll get suggestions as to what to try next...

David

-- 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need to upgrade to latest stable mdadm version?
  2006-01-23 14:02   ` Need to upgrade to latest stable mdadm version? Mitchell Laks
@ 2006-01-23 13:58     ` Brad Campbell
  0 siblings, 0 replies; 8+ messages in thread
From: Brad Campbell @ 2006-01-23 13:58 UTC (permalink / raw)
  To: Mitchell Laks; +Cc: linux-raid

Mitchell Laks wrote:

> It does not work with the debian sid 2.6.15 kernel. You were far more 
> observant than I. I thought that my 2 raids were working with the debian 
> 2.6.15 but in fact it one of the drives had failed out of the  array (thats 
> why it seemed to work, I hadn't slept enough...).
> 
> In fact your diagnosis is completely correct. I cannot get both sata_via and 
> sata_promise up at the same time, without timeout problems. 

I _hope_ this is just an anomaly as I have a production machine here with 2.6.10, 3 sata_pro6mise 
cards and using the on-board sata_via interface also.. I was planning on upgrading it to 2.6.15 or 
2.6.16 when it's released to gain the benefit of the new raid-5 error recovery patches.


-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need to upgrade to latest stable mdadm version?
  2006-01-22 17:49 ` David Greaves
  2006-01-22 22:31   ` Mark Hahn
@ 2006-01-23 14:02   ` Mitchell Laks
  2006-01-23 13:58     ` Brad Campbell
  1 sibling, 1 reply; 8+ messages in thread
From: Mitchell Laks @ 2006-01-23 14:02 UTC (permalink / raw)
  To: linux-raid

On Sunday 22 January 2006 12:49 pm, David Greaves wrote:

> Just FYI
> I am running the *stock* 2.6.15 and get the same problems (ata timeouts
> etc)
>
> I recently wrote to the lkml and ide lists following up an old post.
> No replies yet.
>
> I run sata_via and sata_sil
> (I've seen just a few people with these problems - various kernels -
> all, to my recollection, seem to have two sets of sata controller
> chips.... I wonder....)
>
> I note you say the *sid* kernel doesn't have these problems so I'll try
> that. I just wanted to mention that the problem may exist on stock
> kernels 'cos you talk about rolling your own...
David! you are right!

It does not work with the debian sid 2.6.15 kernel. You were far more 
observant than I. I thought that my 2 raids were working with the debian 
2.6.15 but in fact it one of the drives had failed out of the  array (thats 
why it seemed to work, I hadn't slept enough...).

In fact your diagnosis is completely correct. I cannot get both sata_via and 
sata_promise up at the same time, without timeout problems. 

I am hereby switching exclusively to sata_promise as I can install multiple 
cards on the same machine.

Mitchell

>
> David
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* (unknown), 
  2006-01-23 12:31     ` Possible libata/sata/Asus problem (was Re: Need to upgrade to latest stable mdadm version?) David Greaves
@ 2006-01-23 17:05       ` Shawn Usry
  0 siblings, 0 replies; 8+ messages in thread
From: Shawn Usry @ 2006-01-23 17:05 UTC (permalink / raw)
  To: linux-raid

help

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-01-23 17:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-22 17:41 Need to upgrade to latest stable mdadm version? Mitchell Laks
2006-01-22 17:49 ` David Greaves
2006-01-22 22:31   ` Mark Hahn
2006-01-23 12:31     ` Possible libata/sata/Asus problem (was Re: Need to upgrade to latest stable mdadm version?) David Greaves
2006-01-23 17:05       ` (unknown), Shawn Usry
2006-01-23 14:02   ` Need to upgrade to latest stable mdadm version? Mitchell Laks
2006-01-23 13:58     ` Brad Campbell
2006-01-22 18:20 ` Gordon Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).