[BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset
@ 2008-04-22 23:14 speedy
  2008-04-26  6:11 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: speedy @ 2008-04-22 23:14 UTC (permalink / raw)
  To: linux-kernel

Hello Linux kernel crew,

       [Consider this more as a datapoint then a bug report, as after
       one network and one sata/southbridge issues showing up
       interminnently, the ASRock motherboard involved will be
       scrapped for a different one]

       The integrated NVidia sata controller and/or the hard-drive has failed
       during operation with the following output:

Apr 22 23:36:54 backupserver kernel: [91202.294632]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 22 23:36:59 backupserver kernel: [91207.657630] ata2: port is slow to respond, please be patient (Status 0xd0)
Apr 22 23:37:04 backupserver kernel: [91212.331576] ata2: device not ready (errno=-16), forcing hardreset
Apr 22 23:37:04 backupserver kernel: [91212.331583] ata2: hard resetting port
Apr 22 23:37:09 backupserver kernel: [91217.874396] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:14 backupserver kernel: [91222.368598] ata2: hard resetting port
Apr 22 23:37:19 backupserver kernel: [91227.911395] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:24 backupserver kernel: [91232.405597] ata2: hard resetting port
Apr 22 23:37:29 backupserver kernel: [91237.948395] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:59 backupserver kernel: [91267.370311] ata2: hard resetting port
Apr 22 23:38:04 backupserver kernel: [91272.373843] ata2.00: disabled
Apr 22 23:38:04 backupserver kernel: [91272.373858] ata2: EH complete
Apr 22 23:38:04 backupserver kernel: [91272.374653] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.374659] end_request: I/O error, dev sdb, sector 35277535
Apr 22 23:38:04 backupserver kernel: [91272.374682] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374706] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374726] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374745] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374765] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374785] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374805] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374825] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374844] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374864] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.375058] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.375062] end_request: I/O error, dev sdb, sector 35278559
Apr 22 23:38:04 backupserver kernel: [91272.375096] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.375099] end_request: I/O error, dev sdb, sector 407240943
.
.
.

       Full /var/log/messages can be found on: http://87.230.23.147/messages_sata_crash.txt

       The two 500GB Samsung HD501LJ hard-drives were making resetting
       sounds in regular intervals, trying to recover from the error,
       unsucessfuly. The system was accessed via network/SSH and was
       shutdown "gracefully" via shutdown -h now.

       After restarting, the system seemingly continued to operate
       normaly without any apparent data loss.

       One thing of note is that the south-bridge was alarmingly hot
       to the touch (you could "burn your finger" on it) so I would
       attribute the problems to improper cooling of hardware.
       Previously the system had uptimes of 100+ days as a render farm
       master using Windows 2000 (mostly CPU/memory load, though).

       I won't be able to test the same system further as it's
       motherboard will be (promptly:p) exchanged.

       ps. Keep me in CC:, not following the list.

-- 
Best regards,
 speedy                          mailto:speedy@3d-io.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset
  2008-04-22 23:14 [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset speedy
@ 2008-04-26  6:11 ` Andrew Morton
  2008-04-29  0:55   ` Re[2]: " speedy
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2008-04-26  6:11 UTC (permalink / raw)
  To: speedy; +Cc: linux-kernel, linux-ide

> On Wed, 23 Apr 2008 01:14:59 +0200 speedy <speedy@3d-io.com> wrote:
> Hello Linux kernel crew,
> 
>        [Consider this more as a datapoint then a bug report, as after
>        one network and one sata/southbridge issues showing up
>        interminnently, the ASRock motherboard involved will be
>        scrapped for a different one]
> 
>        The integrated NVidia sata controller and/or the hard-drive has failed
>        during operation with the following output:

2.6.22 is rather old.  Can you please retest 2.6.25?

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re[2]: [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset
  2008-04-26  6:11 ` Andrew Morton
@ 2008-04-29  0:55   ` speedy
  2008-04-29 14:48     ` Oliver Pinter
  0 siblings, 1 reply; 4+ messages in thread
From: speedy @ 2008-04-29  0:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-ide

Hello Andrew,

Saturday, April 26, 2008, 8:11:08 AM, you wrote:

>> On Wed, 23 Apr 2008 01:14:59 +0200 speedy <speedy@3d-io.com> wrote:
>> Hello Linux kernel crew,
>> 
>>        [Consider this more as a datapoint then a bug report, as after
>>        one network and one sata/southbridge issues showing up
>>        interminnently, the ASRock motherboard involved will be
>>        scrapped for a different one]
>> 
>>        The integrated NVidia sata controller and/or the hard-drive has failed
>>        during operation with the following output:

AM> 2.6.22 is rather old.  Can you please retest 2.6.25?

    Unfortunately not, the motherboard has been changed for a
    different one in that server, as I needed to deploy it. The
    system is now behaving properly.

    If someone of the kernel developers is interested in toying with
    an NForce 2 motherboard which probably overheats the southbridge
    and crashes approx. once a day under I/O load, I could ask the
    management to donate it.

    It reproduces:

    * "RX unit hang detected" in i1000 drivers
    * SATA soft-raid (infinite?) HDD resetting loop

    ;)

    Cheers!


-- 
Best regards,
 speedy                            mailto:speedy@3d-io.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re[2]: [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset
  2008-04-29  0:55   ` Re[2]: " speedy
@ 2008-04-29 14:48     ` Oliver Pinter
  0 siblings, 0 replies; 4+ messages in thread
From: Oliver Pinter @ 2008-04-29 14:48 UTC (permalink / raw)
  To: speedy; +Cc: Andrew Morton, linux-kernel, linux-ide

hi, the cooling of the southbridge is good?

or when you has spirit for testing or use newer 2.6.22.y, then:
http://repo.or.cz/w/linux-2.6.22.y-op.git
http://students.zipernowsky.hu/~oliverp/kernel-stable/

sorry for bad english

On 4/29/08, speedy <speedy@3d-io.com> wrote:
> Hello Andrew,
>
> Saturday, April 26, 2008, 8:11:08 AM, you wrote:
>
> >> On Wed, 23 Apr 2008 01:14:59 +0200 speedy <speedy@3d-io.com> wrote:
> >> Hello Linux kernel crew,
> >>
> >>        [Consider this more as a datapoint then a bug report, as after
> >>        one network and one sata/southbridge issues showing up
> >>        interminnently, the ASRock motherboard involved will be
> >>        scrapped for a different one]
> >>
> >>        The integrated NVidia sata controller and/or the hard-drive has
> failed
> >>        during operation with the following output:
>
> AM> 2.6.22 is rather old.  Can you please retest 2.6.25?
>
>     Unfortunately not, the motherboard has been changed for a
>     different one in that server, as I needed to deploy it. The
>     system is now behaving properly.
>
>     If someone of the kernel developers is interested in toying with
>     an NForce 2 motherboard which probably overheats the southbridge
>     and crashes approx. once a day under I/O load, I could ask the
>     management to donate it.
>
>     It reproduces:
>
>     * "RX unit hang detected" in i1000 drivers
>     * SATA soft-raid (infinite?) HDD resetting loop
>
>     ;)
>
>     Cheers!
>
>
> --
> Best regards,
>  speedy                            mailto:speedy@3d-io.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-04-29 14:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-22 23:14 [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset speedy
2008-04-26  6:11 ` Andrew Morton
2008-04-29  0:55   ` Re[2]: " speedy
2008-04-29 14:48     ` Oliver Pinter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox