* [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset
@ 2008-04-22 23:14 speedy
2008-04-26 6:11 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: speedy @ 2008-04-22 23:14 UTC (permalink / raw)
To: linux-kernel
Hello Linux kernel crew,
[Consider this more as a datapoint then a bug report, as after
one network and one sata/southbridge issues showing up
interminnently, the ASRock motherboard involved will be
scrapped for a different one]
The integrated NVidia sata controller and/or the hard-drive has failed
during operation with the following output:
Apr 22 23:36:54 backupserver kernel: [91202.294632] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 22 23:36:59 backupserver kernel: [91207.657630] ata2: port is slow to respond, please be patient (Status 0xd0)
Apr 22 23:37:04 backupserver kernel: [91212.331576] ata2: device not ready (errno=-16), forcing hardreset
Apr 22 23:37:04 backupserver kernel: [91212.331583] ata2: hard resetting port
Apr 22 23:37:09 backupserver kernel: [91217.874396] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:14 backupserver kernel: [91222.368598] ata2: hard resetting port
Apr 22 23:37:19 backupserver kernel: [91227.911395] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:24 backupserver kernel: [91232.405597] ata2: hard resetting port
Apr 22 23:37:29 backupserver kernel: [91237.948395] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:59 backupserver kernel: [91267.370311] ata2: hard resetting port
Apr 22 23:38:04 backupserver kernel: [91272.373843] ata2.00: disabled
Apr 22 23:38:04 backupserver kernel: [91272.373858] ata2: EH complete
Apr 22 23:38:04 backupserver kernel: [91272.374653] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.374659] end_request: I/O error, dev sdb, sector 35277535
Apr 22 23:38:04 backupserver kernel: [91272.374682] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374706] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374726] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374745] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374765] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374785] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374805] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374825] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374844] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374864] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.375058] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.375062] end_request: I/O error, dev sdb, sector 35278559
Apr 22 23:38:04 backupserver kernel: [91272.375096] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.375099] end_request: I/O error, dev sdb, sector 407240943
.
.
.
Full /var/log/messages can be found on: http://87.230.23.147/messages_sata_crash.txt
The two 500GB Samsung HD501LJ hard-drives were making resetting
sounds in regular intervals, trying to recover from the error,
unsucessfuly. The system was accessed via network/SSH and was
shutdown "gracefully" via shutdown -h now.
After restarting, the system seemingly continued to operate
normaly without any apparent data loss.
One thing of note is that the south-bridge was alarmingly hot
to the touch (you could "burn your finger" on it) so I would
attribute the problems to improper cooling of hardware.
Previously the system had uptimes of 100+ days as a render farm
master using Windows 2000 (mostly CPU/memory load, though).
I won't be able to test the same system further as it's
motherboard will be (promptly:p) exchanged.
ps. Keep me in CC:, not following the list.
--
Best regards,
speedy mailto:speedy@3d-io.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset
2008-04-22 23:14 [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset speedy
@ 2008-04-26 6:11 ` Andrew Morton
2008-04-29 0:55 ` Re[2]: " speedy
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2008-04-26 6:11 UTC (permalink / raw)
To: speedy; +Cc: linux-kernel, linux-ide
> On Wed, 23 Apr 2008 01:14:59 +0200 speedy <speedy@3d-io.com> wrote:
> Hello Linux kernel crew,
>
> [Consider this more as a datapoint then a bug report, as after
> one network and one sata/southbridge issues showing up
> interminnently, the ASRock motherboard involved will be
> scrapped for a different one]
>
> The integrated NVidia sata controller and/or the hard-drive has failed
> during operation with the following output:
2.6.22 is rather old. Can you please retest 2.6.25?
Thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re[2]: [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset
2008-04-26 6:11 ` Andrew Morton
@ 2008-04-29 0:55 ` speedy
2008-04-29 14:48 ` Oliver Pinter
0 siblings, 1 reply; 4+ messages in thread
From: speedy @ 2008-04-29 0:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-ide
Hello Andrew,
Saturday, April 26, 2008, 8:11:08 AM, you wrote:
>> On Wed, 23 Apr 2008 01:14:59 +0200 speedy <speedy@3d-io.com> wrote:
>> Hello Linux kernel crew,
>>
>> [Consider this more as a datapoint then a bug report, as after
>> one network and one sata/southbridge issues showing up
>> interminnently, the ASRock motherboard involved will be
>> scrapped for a different one]
>>
>> The integrated NVidia sata controller and/or the hard-drive has failed
>> during operation with the following output:
AM> 2.6.22 is rather old. Can you please retest 2.6.25?
Unfortunately not, the motherboard has been changed for a
different one in that server, as I needed to deploy it. The
system is now behaving properly.
If someone of the kernel developers is interested in toying with
an NForce 2 motherboard which probably overheats the southbridge
and crashes approx. once a day under I/O load, I could ask the
management to donate it.
It reproduces:
* "RX unit hang detected" in i1000 drivers
* SATA soft-raid (infinite?) HDD resetting loop
;)
Cheers!
--
Best regards,
speedy mailto:speedy@3d-io.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re[2]: [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset
2008-04-29 0:55 ` Re[2]: " speedy
@ 2008-04-29 14:48 ` Oliver Pinter
0 siblings, 0 replies; 4+ messages in thread
From: Oliver Pinter @ 2008-04-29 14:48 UTC (permalink / raw)
To: speedy; +Cc: Andrew Morton, linux-kernel, linux-ide
hi, the cooling of the southbridge is good?
or when you has spirit for testing or use newer 2.6.22.y, then:
http://repo.or.cz/w/linux-2.6.22.y-op.git
http://students.zipernowsky.hu/~oliverp/kernel-stable/
sorry for bad english
On 4/29/08, speedy <speedy@3d-io.com> wrote:
> Hello Andrew,
>
> Saturday, April 26, 2008, 8:11:08 AM, you wrote:
>
> >> On Wed, 23 Apr 2008 01:14:59 +0200 speedy <speedy@3d-io.com> wrote:
> >> Hello Linux kernel crew,
> >>
> >> [Consider this more as a datapoint then a bug report, as after
> >> one network and one sata/southbridge issues showing up
> >> interminnently, the ASRock motherboard involved will be
> >> scrapped for a different one]
> >>
> >> The integrated NVidia sata controller and/or the hard-drive has
> failed
> >> during operation with the following output:
>
> AM> 2.6.22 is rather old. Can you please retest 2.6.25?
>
> Unfortunately not, the motherboard has been changed for a
> different one in that server, as I needed to deploy it. The
> system is now behaving properly.
>
> If someone of the kernel developers is interested in toying with
> an NForce 2 motherboard which probably overheats the southbridge
> and crashes approx. once a day under I/O load, I could ask the
> management to donate it.
>
> It reproduces:
>
> * "RX unit hang detected" in i1000 drivers
> * SATA soft-raid (infinite?) HDD resetting loop
>
> ;)
>
> Cheers!
>
>
> --
> Best regards,
> speedy mailto:speedy@3d-io.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Thanks,
Oliver
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-04-29 14:48 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-22 23:14 [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset speedy
2008-04-26 6:11 ` Andrew Morton
2008-04-29 0:55 ` Re[2]: " speedy
2008-04-29 14:48 ` Oliver Pinter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox