From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Hancock Subject: Re: ata8.00: log page 10h reported inactive tag 0 Date: Wed, 17 Feb 2010 18:25:15 -0600 Message-ID: <4B7C88EB.1040502@gmail.com> References: <87y6isp4da.fsf@nemi.mork.no> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-gx0-f227.google.com ([209.85.217.227]:61594 "EHLO mail-gx0-f227.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752876Ab0BRAZS (ORCPT ); Wed, 17 Feb 2010 19:25:18 -0500 Received: by gxk27 with SMTP id 27so1979613gxk.1 for ; Wed, 17 Feb 2010 16:25:17 -0800 (PST) In-Reply-To: <87y6isp4da.fsf@nemi.mork.no> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: =?UTF-8?B?QmrDuHJuIE1vcms=?= Cc: linux-ide@vger.kernel.org On 02/17/2010 04:02 AM, Bj=C3=B8rn Mork wrote: > I'm trying to debug a problem I've been having a couple of times over > the last few weeks, where this machine hangs *really* hard: No consol= e > output at all, and absolutely no response on the console - not even > using the magic SysRq key (using "break" on the serial console. This= is > tested and known to be fully functional under normal conditions). > > I really have no clue where to start to locate the cause of this, but > after rebooting the last time there were a few libata error messages > which puzzle me so I might as well start here. Understandig these > errors will be useful in itself. And they are related to the last > pieces of hardware added (SiI 3132 controller attached to a SiI 4726 > port multiplier with 3 disks), which make them more suspicious in my > eyes... > > These are the messages I worry about (full dmesg is included below): > > [ 63.865723] ata8.00: log page 10h reported inactive tag 0 > [ 63.943744] ata8.00: exception Emask 0x1 SAct 0x3c SErr 0x0 action= 0x0 > [ 64.107789] ata8.00: irq_stat 0x03060002, device error via SDB FIS > [ 64.199764] ata8.00: cmd 60/08:10:00:02:00/00:00:00:00:00/40 tag 2= ncq 4096 in > [ 64.199765] res 60/08:10:00:02:00/00:00:00:00:00/40 Emask= 0x1 (device error) > [ 64.580730] ata8.00: status: { DRDY DF } > [ 64.628731] ata8.00: cmd 60/78:18:08:02:00/00:00:00:00:00/40 tag 3= ncq 61440 in > [ 64.628732] res 60/78:18:08:02:00/00:00:00:00:00/40 Emask= 0x89 (media error) > [ 64.810084] ata8.00: status: { DRDY DF } > [ 64.879835] ata8.00: error: { UNC IDNF } > [ 64.930935] ata8.00: cmd 60/18:20:e8:00:00/00:00:00:00:00/40 tag 4= ncq 12288 in > [ 64.930936] res 56/1b:02:02:00:00/00:00:00:40:56/00 Emask= 0x1 (device error) > [ 65.204495] ata8.00: status: { DRDY } > [ 65.248497] ata8.00: error: { IDNF } > [ 65.292407] ata8.00: cmd 60/80:28:80:01:00/00:00:00:00:00/40 tag 5= ncq 65536 in > [ 65.292408] res 56/1b:02:02:00:00/00:00:00:50:56/00 Emask= 0x1 (device error) > [ 65.590268] ata8.00: status: { DRDY } > [ 65.658016] ata8.00: error: { IDNF } > [ 65.725346] ata8.00: configured for UDMA/100 > [ 65.780607] sd 7:0:0:0: [sdd] Device not ready: Sense Key : Not Re= ady [current] [descriptor] > [ 65.925384] sd 7:0:0:0: [sdd] Device not ready: Add. Sense: Logica= l unit not ready, cause not reportable > [ 66.040355] end_request: I/O error, dev sdd, sector 520 > [ 66.103442] ata8: EH complete > [ 66.137877] sd 7:0:0:0: [sdd] 3907029168 512-byte hardware sectors= (2000399 MB) > [ 66.228486] sd 7:0:0:0: [sdd] Write Protect is off > [ 66.285654] sd 7:0:0:0: [sdd] Mode Sense: 00 3a 00 00 > [ 66.285676] sd 7:0:0:0: [sdd] Write cache: enabled, read cache: en= abled, doesn't support DPO or FUA > > > These errors appeared after power cycling the hanging machine, and ma= y > therefore just as well be symptoms as a cause. As you see, the error > handling is successful and all drives are working as expected. So I > guess this might just be a harmless warning caused by an unrelated ha= ng > and the unexpected power cycling. Anyway, here are the details of thi= s > controller in case they are of interest: Well, that error indicates a read error on some sectors reported by the= =20 drive. This could be caused by a hard power-down in the middle of a=20 write to those sectors - in that case, one can in principle use=20 something like hdparm --write-sector to rewrite the sector correctly.=20 However, it could also be due to a drive fault.