From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756742AbYESJFX (ORCPT ); Mon, 19 May 2008 05:05:23 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751448AbYESJFG (ORCPT ); Mon, 19 May 2008 05:05:06 -0400 Received: from cardassian3.kabelfoon.nl ([62.45.45.105]:62563 "EHLO cardassian.kabelfoon.nl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751439AbYESJFD (ORCPT ); Mon, 19 May 2008 05:05:03 -0400 X-Greylist: delayed 1736 seconds by postgrey-1.27 at vger.kernel.org; Mon, 19 May 2008 05:05:02 EDT Message-ID: <48313BF3.3080605@caiway.nl> Date: Mon, 19 May 2008 10:36:03 +0200 From: Jan Evert van Grootheest User-Agent: Thunderbird 2.0.0.14 (X11/20080505) MIME-Version: 1.0 To: linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org Subject: SATA disk dies and revives after boot Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=sha1; boundary="------------ms040602080208020500080009" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a cryptographically signed message in MIME format. --------------ms040602080208020500080009 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, Yesterday the below happened on my home xen server, dom0 is debian stable with ubuntu kernel 2.6.24-16-xen. Given that yesterday was sunday, there was not much going on (I guess that we were somewhere between church and home, so there really was not much going on). After this, the disk does not respond to anything and needs a reboot to return to sanity. After that it may work for some period of time (days or weeks). I recently ran a long smart test and that returned no errors. Also after a reboot the disk seems to be just fine (except I need to re-add to the RAID1 arrays). I've also had this disk connected to a promise controller. The same thing happened there. Previously, using 2.6.18, it would do this as well. May 18 13:06:15 quark kernel: [174871.044304] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen May 18 13:06:15 quark kernel: [174871.044353] ata5.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 May 18 13:06:15 quark kernel: [174871.044355] res 40/00:00:01:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) May 18 13:06:15 quark kernel: [174871.044412] ata5.00: status: { DRDY } May 18 13:06:20 quark kernel: [174876.082713] ata5: port is slow to respond, please be patient (Status 0xd0) May 18 13:06:25 quark kernel: [174881.065279] ata5: soft resetting link May 18 13:06:55 quark kernel: [174911.291301] ata5.00: qc timeout (cmd 0xec) May 18 13:06:55 quark kernel: [174911.291337] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 18 13:06:55 quark kernel: [174911.291361] ata5.00: revalidation failed (errno=-5) May 18 13:06:55 quark kernel: [174911.291384] ata5: failed to recover some devices, retrying in 5 secs May 18 13:07:05 quark kernel: [174921.328542] ata5: port is slow to respond, please be patient (Status 0xd0) May 18 13:07:10 quark kernel: [174926.312085] ata5: soft resetting link May 18 13:07:40 quark kernel: [174956.537601] ata5.00: qc timeout (cmd 0xec) May 18 13:07:40 quark kernel: [174956.537638] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 18 13:07:40 quark kernel: [174956.537662] ata5.00: revalidation failed (errno=-5) May 18 13:07:40 quark kernel: [174956.537685] ata5: failed to recover some devices, retrying in 5 secs May 18 13:07:50 quark kernel: [174966.580807] ata5: port is slow to respond, please be patient (Status 0xd0) May 18 13:07:55 quark kernel: [174971.564289] ata5: soft resetting link May 18 13:08:26 quark kernel: [175001.790832] ata5.00: qc timeout (cmd 0xec) May 18 13:08:26 quark kernel: [175001.790867] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 18 13:08:26 quark kernel: [175001.790891] ata5.00: revalidation failed (errno=-5) May 18 13:08:26 quark kernel: [175001.790914] ata5.00: disabled May 18 13:08:31 quark kernel: [175007.327614] ata5: port is slow to respond, please be patient (Status 0xd0) May 18 13:08:36 quark kernel: [175012.311144] ata5: soft resetting link May 18 13:08:36 quark kernel: [175012.478592] ata5: EH complete May 18 13:08:36 quark kernel: [175012.478684] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK May 18 13:08:36 quark kernel: [175012.478726] end_request: I/O error, dev sdb, sector 62412332 May 18 13:08:36 quark kernel: [175012.478751] md: super_written gets error=-5, uptodate=0 May 18 13:08:36 quark kernel: [175012.478777] raid1: Disk failure on sdb5, disabling device. The ata/disk info from dmesg: [ 4.716187] sata_via 0000:00:0f.0: version 2.3 [ 4.716429] sata_via 0000:00:0f.0: routed to hard irq line 10 [ 4.720203] scsi3 : sata_via [ 4.721459] scsi4 : sata_via [ 4.721675] ata5: SATA max UDMA/133 cmd 0xd400 ctl 0xd000 bmdma 0xcc08 irq 20 [ 5.135540] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 5.299951] ata5.00: ATA-7: Maxtor 6Y080M0, YAR511W0, max UDMA/133 [ 5.300033] ata5.00: 160086528 sectors, multi 16: LBA [ 5.315957] ata5.00: configured for UDMA/133 [ 5.316356] sd 4:0:0:0: [sdb] 160086528 512-byte hardware sectors (81964 MB) [ 5.316448] sd 4:0:0:0: [sdb] Write Protect is off [ 5.316526] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 5.316543] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.316684] sd 4:0:0:0: [sdb] 160086528 512-byte hardware sectors (81964 MB) [ 5.316772] sd 4:0:0:0: [sdb] Write Protect is off [ 5.316850] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 5.316865] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.316960] sdb: sdb2 < sdb5 sdb6 sdb7 sdb8 > [ 5.404184] sd 4:0:0:0: [sdb] Attached SCSI disk It is this sata controller: 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) Subsystem: Micro-Star International Co., Ltd. K8T Neo 2 Motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR-