linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Tim Connors <tconnors@rather.puzzling.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-ide@vger.kernel.org
Subject: Re: sata error on ICH8M
Date: Tue, 11 Nov 2008 23:01:13 +0900	[thread overview]
Message-ID: <49199029.8070802@kernel.org> (raw)
In-Reply-To: <alpine.DEB.2.00.0811040833110.24370@dirac.rather.puzzling.org>

Hello,

Tim Connors wrote:
> I'm running a debian 2.6.26-9 kernel (sid) on a new laptop with:

Hmmm...

> 00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0])
>         Subsystem: Dell Device 0275
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin B routed to IRQ 378
>         Region 0: I/O ports at 1c00 [size=8]
>         Region 1: I/O ports at 18d4 [size=4]
>         Region 2: I/O ports at 18d8 [size=8]
>         Region 3: I/O ports at 18d0 [size=4]
>         Region 4: I/O ports at 18e0 [size=32]
>         Region 5: Memory at f8504000 (32-bit, non-prefetchable) [size=2K]
>         Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/2 Enable+
>                 Address: fee0300c  Data: 41a1
>         Capabilities: [70] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [a8] SATA HBA <?>
>         Kernel driver in use: ahci
>         Kernel modules: ahci
> 
> Soon after booting, and possibly after the disk had spun down and was
> asked to spin back up (which it had done successfully a few times so
> far, with a spinddown timeout of 10 minutes, and using laptop_mode), it
> had a sata error:
> 
> Nov  4 01:49:29 gamow kernel: [ 1865.106289] ata1.00: exception Emask 0x10 SAct 0x3ff SErr 0x50000 action 0xe frozen
> Nov  4 01:50:29 gamow kernel: [ 1865.106307] ata1: SError: { PHYRdyChg CommWake }
> Nov  4 01:50:29 gamow kernel: [ 1865.106319] ata1.00: cmd 61/08:00:bd:c9:57/00:00:00:00:00/40 tag 0 ncq 4096 out
> Nov  4 01:50:29 gamow kernel: [ 1865.106322]          res 40/00:00:02:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
> Nov  4 01:50:29 gamow kernel: [ 1865.106329] ata1.00: status: { DRDY }
> Nov  4 01:50:29 gamow kernel: [ 1865.106339] ata1.00: cmd 61/08:08:35:7a:73/00:00:00:00:00/40 tag 1 ncq 4096 out
> Nov  4 01:50:29 gamow kernel: [ 1865.106343]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
> Nov  4 01:50:29 gamow kernel: [ 1865.106350] ata1.00: status: { DRDY }
> Nov  4 01:50:29 gamow kernel: [ 1865.106361] ata1.00: cmd 61/08:10:35:36:9b/00:00:00:00:00/40 tag 2 ncq 4096 out
> Nov  4 01:50:29 gamow kernel: [ 1865.106364]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
> Nov  4 01:50:29 gamow kernel: [ 1865.106371] ata1.00: status: { DRDY }
> Nov  4 01:50:29 gamow kernel: [ 1865.106382] ata1.00: cmd 61/28:18:55:94:e2/00:00:00:00:00/40 tag 3 ncq 20480 out
> 
> And then it fails to reset after some time, remounting the devices
> readonly (although what wasn't already in the cache became unreadable with
> lots of IO errors rapidly filling up dmesg).  None of the logs made it to
> disk, naturally enough, and this was what I caught in syslog before syslog
> bailed.  There were interesting messages that happened after this, but the
> dmesg buffer filled up before I thought about saving them.

Well, the disk is already a goner at that point so you need to either
set up a netconsole or plug in a usb stick, mount it and do "while
true; do dmesg -c >> /mnt/usbstick/dmesg.out; sleep 1; done" to
capture the kernel log.

> My /sys/class/scsi_host/host0/link_power_management_policy is:
> min_power
> .  powertop had earlier (in a previous warm-boot) prompted me
> to set link_power_management_policy, so I had been tweaking that, but I
> didn't look at its default setting - I presume it was already at
> min_power, as it is now from a fresh (cold) bootup.
> 
> Just in case it is related, in each bootup, I earlier get
> Nov  4 01:20:26 gamow kernel: [  116.217492] CE: hpet increasing min_delta_ns to 15000 nsec
> Nov  4 01:20:28 gamow kernel: [  118.073078] CE: hpet increasing min_delta_ns to 22500 nsec
> Nov  4 01:20:30 gamow kernel: [  120.467250] ACPI: EC: missing confirmations, switch off interrupt mode.
> Nov  4 01:20:30 gamow kernel: [  120.603593] CE: hpet increasing min_delta_ns to 33750 nsec
> Nov  4 01:20:31 gamow kernel: [  121.004372] ACPI Exception (evregion-0420): AE_TIME, Returned by Handler for [EmbeddedControl] [20080321]
> Nov  4 01:20:31 gamow kernel: [  121.004372] ACPI Error (psparse-0530): Method parse/execution failed [\_SB_.PCI0.LPCB.BAT1._BST] (Node ffff81013fa6cb90), AE_TIME
> Nov  4 01:20:31 gamow kernel: [  121.004372] ACPI Exception (battery-0360): AE_TIME, Evaluating _BST [20080321]
> 
> and/or
> 
> [ 1587.842640] CE: hpet increasing min_delta_ns to 15000 nsec
> (successively increasing as time goes on)
> 
> The sata link went belly up perhaps a few minutes after I went to bed
> lastnight, and the only thing I can think of that I did before then was to
> unplug and replug the ethernet and/or the wireless.  The ACPI messages
> seem to happen around networking events on this laptop, but I haven't had
> a chance to investigate further.

I really need to see how the recovery attempt failed.  Can you please
reproduce the problem and report the kernel log?  Also, please report
how reproducible the problem is.

Thanks.

-- 
tejun

  parent reply	other threads:[~2008-11-11 14:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-04 11:26 sata error on ICH8M Tim Connors
2008-11-04 11:52 ` sata error on ICH8M (Dell Vostro 1710 laptop) Tim Connors
2008-11-11 14:01 ` Tejun Heo [this message]
2008-11-11 14:32   ` sata error on ICH8M Mark Lord
2008-11-12  1:43     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49199029.8070802@kernel.org \
    --to=tj@kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tconnors@rather.puzzling.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).