linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Random shutdown of disks using sata_mv
@ 2009-12-01 12:15 Caspar Smit
  2009-12-01 14:29 ` Simon Jackson
  2009-12-01 15:45 ` saeed bishara
  0 siblings, 2 replies; 7+ messages in thread
From: Caspar Smit @ 2009-12-01 12:15 UTC (permalink / raw)
  To: linux-ide



Hi,

I'm having a problem where in random one of my disks shuts
down and is disconnected from the linux kernel. In other words I have to
reboot the system or physically unplug/replug the disk to get it to work
again.

I will provide my configuration:

SuperMicro
SC-216 chassis (24 bay 2,5" disks)
24x Seagate ST9500420AS 500Gb
7200 RPM Hard Drives
3x SuperMicro AOC-SAT2-MV8 (SATA Controller
using the sata_mv kernel driver)

I use Debian Lenny 5.0 and
kernel: linux-image-2.6.30-bpo.2-amd64
(2.6.30-8~bpo50+1) from the
backports repository.

The symptom is that after a while of
operation a disk is shut down and kicked out of a RAID set. It doesn't
matter if there is load or not on the system.

The logging
says:

sd 11:0:0:0: [sdk] Unhandled error code
sd 11:0:0:0:
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0:
end_request: I/O error, dev sdk, sector 0 

In this case sdk,
but it happens to all disks.
Then the disk is not readable by the
system anymore.

When I check the disk for errors
(badblocks/smart) in another system it doesn't give any errors.
I
only have this with 2,5" systems.

Is this a sata_mv
problem? A disk problem? or anything else?
I can provide more info if
needed.

Kind regards,
Caspar Smit


^ permalink raw reply	[flat|nested] 7+ messages in thread
* RE: Random shutdown of disks using sata_mv
@ 2009-12-02 10:40 Caspar Smit
  2009-12-02 15:54 ` saeed bishara
  0 siblings, 1 reply; 7+ messages in thread
From: Caspar Smit @ 2009-12-02 10:40 UTC (permalink / raw)
  To: Simon Jackson; +Cc: linux-ide@vger.kernel.org



Hi Simon,

We are not experiencing that "FAILED TO
IDENTIFY" error.

Kind regards,
Caspar


> We are investigating a similar type of problem seen on several
of
our
> systems.
> Seemingly at random (though some
systems
seem more susceptible than
> others) we see the ata
link reset and
subsequently there is a FAILED TO
> IDENTIFY
error logged. 
smartctl is unable to get information from the
> drive and a power
cycle of the drive is required to bring it
back on line.
> 
> I would be interested to know if the
ata level errors are similar
to those
> we are seeing.
> 
> 
>
-----Original Message-----
>
From:
linux-ide-owner@vger.kernel.org
>
[mailto:linux-ide-owner@vger.kernel.org] On Behalf Of Caspar Smit
> Sent: 01 December 2009 12:16
> To:
linux-ide@vger.kernel.org
> Subject: Random shutdown of disks
using sata_mv
> 
> 
> 
> Hi,
>

> I'm having a problem where in random one of my disks shuts
> down and is disconnected from the linux kernel. In other words I
have to
> reboot the system or physically unplug/replug the
disk
to get it to work
> again.
> 
> I will
provide my
configuration:
> 
> SuperMicro
>
SC-216 chassis
(24 bay 2,5" disks)
> 24x Seagate
ST9500420AS 500Gb
>
7200 RPM Hard Drives
> 3x
SuperMicro AOC-SAT2-MV8 (SATA
Controller
> using the sata_mv
kernel driver)
> 
>
I use Debian Lenny 5.0 and
> kernel:
linux-image-2.6.30-bpo.2-amd64
>
(2.6.30-8~bpo50+1) from the
> backports repository.
> 
> The symptom is that
after a while of
> operation a
disk is shut down and kicked out of
a RAID set. It doesn't
>
matter if there is load or not on the
system.
> 
>
The logging
> says:
> 
> sd 11:0:0:0: [sdk]
Unhandled error code
> sd 11:0:0:0:
> Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd
11:0:0:0:
> end_request: I/O error, dev sdk, sector 0
>

> In this case sdk,
> but it happens to all
disks.
> Then the disk is not readable by the
> system
anymore.
> 
> When I check the disk for errors
>
(badblocks/smart) in another system it doesn't give any
errors.
>
I
> only have this with 2,5"
systems.
> 
> Is
this a sata_mv
> problem? A
disk problem? or anything else?
> I can provide more info if
> needed.
> 
>
Kind regards,
> Caspar
Smit
> 
> --
> To
unsubscribe from this list:
send the line "unsubscribe
linux-ide" in
> the body
of a message to
majordomo@vger.kernel.org
> More majordomo
info at 
http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: Random shutdown of disks using sata_mv
@ 2009-12-03 11:27 Caspar Smit
  0 siblings, 0 replies; 7+ messages in thread
From: Caspar Smit @ 2009-12-03 11:27 UTC (permalink / raw)
  To: linux-ide@vger.kernel.org



> the following lines from the kern.log:
> Nov 24 23:03:23
supernas02 kernel: [131523.808631] ata19.00: exception
> Emask
0x0
SAct 0x0 SErr 0x0 action 0x6 frozen
> Nov 24 23:03:23
supernas02
kernel: [131523.808690] ata19.00: cmd
>
b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
> Nov 24 23:03:23
supernas02 kernel: [131523.808691]          res
>
40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Nov
24
23:03:23 supernas02 kernel: [131523.808770] ata19.00: status: {
>
DRDY }
> Nov 24 23:03:23 supernas02 kernel:
[131523.808801] ata19:
hard resetting
> link
> Nov 24
23:03:28 supernas02 kernel:
[131529.324010] ata19: link is slow
> to respond, please be
patient (ready=0)
> Nov 24
23:03:33 supernas02 kernel:
[131533.860010] ata19: SRST failed
> (errno=-16)
> Nov 24
23:03:33 supernas02 kernel:
[131533.860038] ata19: hard resetting
> link
> Nov 24
23:03:38 supernas02 kernel: [131539.376009]
ata19: link is slow
> to respond, please be patient (ready=0)
> Nov 24 23:03:43
supernas02 kernel: [131543.912006] ata19: SRST
failed
>
(errno=-16)
> Nov 24 23:03:43 supernas02 kernel:
[131543.912033] ata19: hard resetting
> link
> Nov 24
23:03:48 supernas02 kernel: [131549.428010] ata19: link is slow
>
to respond, please be patient (ready=0)
> Nov 24
23:04:18
supernas02 kernel: [131578.940012] ata19: SRST failed
>
(errno=-16)
> Nov 24 23:04:18 supernas02 kernel:
[131578.940048]
ata19: limiting
> SATA link speed to 1.5
Gbps
> Nov 24
23:04:18 supernas02 kernel: [131578.940077]
ata19: hard resetting
> link
> Nov 24 23:04:23 supernas02
kernel: [131583.952009]
ata19: SRST failed
> (errno=-16)
> Nov 24 23:04:23
supernas02 kernel: [131583.958191] ata19:
reset
> failed, giving
up
> Nov 24 23:04:23
supernas02 kernel: [131583.958218] ata19.00:
disabled
> Nov
24 23:04:23 supernas02 kernel: [131583.958253]
ata19: EH complete
> 
> means that a timeout error
occurred, the after
then, the disk didn't
> respond.
> is
it the same
disks that fails all the time?

As the subject says:
it is
random, so not everytime the same disk. This makes it extra hard to
troubleshoot.

Kind regards,
Caspar

> 
> saeed
> 
> On Wed, Dec 2, 2009 at 12:40 PM,
Caspar
Smit <c.smit@truebit.nl> wrote:
>>
>>
>> Hi Simon,
>>
>> We are not
experiencing
that "FAILED TO
>> IDENTIFY"
error.
>>
>> Kind regards,
>> Caspar
>>
>>
>>> We are investigating a similar
type of
problem seen on several
>> of
>> our
>>> systems.
>>> Seemingly at random (though
some
>> systems
>> seem more susceptible than
>>> others) we see the ata
>> link reset and
>> subsequently there is a FAILED TO
>>> IDENTIFY
>> error logged.
>> smartctl is unable to get
information from the
>>> drive and a power
>>
cycle of the drive is required to bring it
>>
back on line.
>>>
>>> I would be interested to
know if the
>> ata level errors are similar
>> to
those
>>> we are seeing.
>>>
>>>
>>>
>> -----Original
Message-----
>>>
>>
From:
>>
linux-ide-owner@vger.kernel.org
>>>
>>
[mailto:linux-ide-owner@vger.kernel.org] On Behalf Of Caspar Smit
>>> Sent: 01 December 2009 12:16
>>> To:
>> linux-ide@vger.kernel.org
>>> Subject: Random
shutdown of disks
>> using sata_mv
>>>
>>>
>>>
>>> Hi,
>>>
>>
>>> I'm having a problem
where
in random one of my disks shuts
>>> down and is
disconnected
from the linux kernel. In other words I
>>
have to
>>> reboot the system or physically unplug/replug
the
>> disk
>> to get it to work
>>>
again.
>>>
>>> I will
>> provide
my
>> configuration:
>>>
>>>
SuperMicro
>>>
>> SC-216 chassis
>>
(24 bay 2,5" disks)
>>> 24x Seagate
>>
ST9500420AS 500Gb
>>>
>> 7200 RPM Hard Drives
>>> 3x
>> SuperMicro AOC-SAT2-MV8 (SATA
>> Controller
>>> using the sata_mv
>>
kernel driver)
>>>
>>>
>> I use
Debian Lenny 5.0
and
>>> kernel:
>>
linux-image-2.6.30-bpo.2-amd64
>>>
>>
(2.6.30-8~bpo50+1) from the
>>> backports repository.
>>>
>>> The symptom is that
>> after
a
while of
>>> operation a
>> disk is shut
down and
kicked out of
>> a RAID set. It doesn't
>>>
>> matter if there is load or not on the
>> system.
>>>
>>>
>> The
logging
>>> says:
>>>
>>> sd
11:0:0:0:
[sdk]
>> Unhandled error code
>>>
sd 11:0:0:0:
>>> Result:
>>
hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
>>> sd
>> 11:0:0:0:
>>> end_request: I/O error, dev sdk,
sector 0
>>>
>>
>>> In this case
sdk,
>>> but it happens to all
>> disks.
>>> Then the disk is not readable by the
>>>
system
>> anymore.
>>>
>>> When
I
check the disk for errors
>>>
>>
(badblocks/smart) in another system it doesn't give any
>>
errors.
>>>
>> I
>>> only have
this
with 2,5"
>> systems.
>>>
>>>
Is
>> this a sata_mv
>>>
problem? A
>>
disk problem? or anything else?
>>> I can provide more info
if
>>>
needed.
>>>
>>>
>> Kind regards,
>>> Caspar
>> Smit
>>>
>>> --
>>> To
>>
unsubscribe from
this list:
>> send the line
"unsubscribe
>> linux-ide" in
>>> the
body
>> of a message to
>>
majordomo@vger.kernel.org
>>> More majordomo
>>
info at
>>
http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe
from this
list: send the line "unsubscribe linux-ide" in
>> the
body of a message to majordomo@vger.kernel.org
>> More
majordomo info at
 http://vger.kernel.org/majordomo-info.html
>>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-12-15  5:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-01 12:15 Random shutdown of disks using sata_mv Caspar Smit
2009-12-01 14:29 ` Simon Jackson
2009-12-15  5:38   ` Tejun Heo
2009-12-01 15:45 ` saeed bishara
  -- strict thread matches above, loose matches on Subject: below --
2009-12-02 10:40 Caspar Smit
2009-12-02 15:54 ` saeed bishara
2009-12-03 11:27 Caspar Smit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).