Re: ATA resets with Intel 8/C220 and HGST drive

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Selim T. Erdoğan" <selim@alumni.cs.utexas.edu>
To: Nicolas George <george@nsup.org>
Cc: debian-user@lists.debian.org, linux-kernel@vger.kernel.org
Subject: Re: ATA resets with Intel 8/C220 and HGST drive
Date: Wed, 29 Apr 2015 22:59:58 +0300	[thread overview]
Message-ID: <20150429195958.GA15167@side> (raw)
In-Reply-To: <20150427130358.GA522233@phare.normalesup.org>

On Mon, Apr 27, 2015 at 03:03:58PM +0200, Nicolas George wrote:
> Summary: I had annoying resets of the SATA bus with a 8 Series/C220 Series
> Chipset controller and a HGST Travelstar 7K1000 drive. I recently managed to
> stop them and as far as I currently know I am satisfied; I write this mail
> in the hope that it may be useful for anyone having similar issues. If you
> do not have that issue and you are not a developer interested in fixing the
> issue more permanently, you can stop reading right now.
> 
> Here are the details. The computer is a Zotac ZBox ID91 nettop with a
> proprietary motherboard, and, as stated above, a Travelstar 7K1000 hard
> drive (a 7200 RPM 2.5", an unusual beast). It was installed around June
> 2014, and I noticed the problems some time later, they probably started
> right away.
> 
> The distribution was a Debian Jessie (testing) with the packaged kernel,
> probably linux-image-3.14-1-amd64:amd64 at the time; the issue was not fixed
> by upgrades.
> 
> The possibly relevant hardware information are these:
> 
> CPU: Intel(R) Core(TM) i3-4130T CPU @ 2.90GHz
> 
> CPU:
> product: Intel(R) Core(TM) i3-4130T CPU @ 2.90GHz
> 
> description: SATA controller
> product: 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
> vendor: Intel Corporation
> physical id: 1f.2
> bus info: pci@0000:00:1f.2
> version: 05
> width: 32 bits
> clock: 66MHz
> capabilities: storage msi pm ahci_1.0 bus_master cap_list
> configuration: driver=ahci latency=0
> resources: irq:42 ioport:f0b0(size=8) ioport:f0a0(size=4) ioport:f090(size=8) ioport:f080(size=4) ioport:f060(size=32) memory:f7d1a000-f7d1a7ff
> 
> description: ATA Disk
> product: HGST HTS721010A9
> physical id: 0.0.0
> bus info: scsi@1:0.0.0
> logical name: /dev/sda
> version: A3J0
> serial: [REMOVED]
> size: 931GiB (1TB)
> capabilities: partitioned partitioned:dos
> configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 signature=d3079a6d
> 
> The resets happened a few times a day (this computer was is kept on for more
> than a day and suspend is not used), mostly when the disk was in heavy use,
> sometimes as early as during the boot; there was a few good days when they
> did not happen. They were annoying because they caused a few seconds freeze
> of anything reading from disk; AFAIK they never resulted in data corruption.
> 
> The corresponding kernel messages look like this:
> 
> [  337.466498] ata2: EH complete
> [  367.251032] ata2.00: exception Emask 0x10 SAct 0x80000 SErr 0x400100 action 0x6 frozen
> [  367.251041] ata2.00: irq_stat 0x08000000, interface fatal error
> [  367.251046] ata2: SError: { UnrecovData Handshk }
> [  367.251053] ata2.00: failed command: WRITE FPDMA QUEUED
> [  367.251063] ata2.00: cmd 61/08:98:68:3b:40/00:00:6b:00:00/40 tag 19 ncq 4096 out
> [  367.251063]          res 50/00:08:68:3b:40/00:00:6b:00:00/40 Emask 0x10 (ATA bus error)
> [  367.251068] ata2.00: status: { DRDY }
> [  367.251075] ata2: hard resetting link
> [  367.571128] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [  367.577660] ata2.00: configured for UDMA/133
> [  367.577676] ata2: EH complete
> [  409.772730] ata2: limiting SATA link speed to 3.0 Gbps
> [  409.772735] ata2.00: exception Emask 0x10 SAct 0x3fe00 SErr 0x400100 action 0x6 frozen
> [  409.772736] ata2.00: irq_stat 0x08000000, interface fatal error
> [  409.772737] ata2: SError: { UnrecovData Handshk }
> [  409.772739] ata2.00: failed command: READ FPDMA QUEUED
> [  409.772742] ata2.00: cmd 60/08:48:78:09:41/00:00:01:00:00/40 tag 9 ncq 4096 in
> [  409.772742]          res 50/00:28:e0:a3:04/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
> [  409.772743] ata2.00: status: { DRDY }
> <snip seven similar "failed command...DRDY" blocks>
> [  409.772773] ata2.00: failed command: WRITE FPDMA QUEUED
> [  409.772776] ata2.00: cmd 61/28:88:e0:a3:04/00:00:02:00:00/40 tag 17 ncq 20480 out
> [  409.772776]          res 50/00:28:e0:a3:04/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
> [  409.772777] ata2.00: status: { DRDY }
> [  409.772779] ata2: hard resetting link
> [  410.092732] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [  410.097670] ata2.00: configured for UDMA/133
> 
> Last week, hinted by the penultimate line, I tried to lower the speed of the
> SATA link permanently, and it worked. I did this by adding
> "libata.force=2:3.0Gbps" to the kernel command line (configured using
> /etc/default/grub).
> 
> Since then, no reset happened; I am confident that seven days without them
> are not a coincidence.

I had a similar experience with a Sony Vaio VGN-NS140 laptop (from 2008)
when its hard drive died a few years ago.  

The replacement drives (new or used) that I tried would work for a 
little while, usually long enough to install Debian, but would get 
corrupted within a few hours.  I would see messages like yours above, 
about going to a lower SATA speed.  From 3.0Gbps to 1.5 Gbps in my case.  
But that wouldn't keep the drive from getting corrupted.  (Maybe it 
was trying to auto-negotiate back to a higher speed, I don't remember.)  
I finally solved it like you, by permanently setting the libata.force 
option to 1.5Gbps.  It worked, but the new replacement drive I had 
bought was an SSD, so I was a little unhappy I had to use it at the
lower speed.

In my case, the original hard drive that came out of the machine, a 
Seagate Momentus, had a jumper which set the maximum speed to 1.5Gbps.  
Presumably, Sony knew that the machine wasn't able to handle higher 
speeds or auto-negotiation of the speed, so they set that jumper.
However, the replacement drives I tried didn't have such speed-limiting 
options, so I had to set it in the kernel module option.  (BTW, a few 
months ago I bought a used Thinkpad which came with a Seagate Momentus 
in it so I was able to set the jumper and stick that drive in the Sony, 
freeing up my SSD for use in the Thinkpad, at its "unreduced" speed.)

> 
> As I said, I consider the issue closed from my point of view. If someone
> wants to investigate further (for example a kernel hacker to actually fix
> this, or a distro developer to make an automatic work-around), I can give
> some more details, and possibly run a few tests if they do not take much
> time and are not too risky.
> 
> Hope this helps.
> 
> Regards,
> 
> -- 
>   Nicolas George

     prev parent reply	other threads:[~2015-04-29 20:00 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-27 13:03 ATA resets with Intel 8/C220 and HGST drive Nicolas George
2015-04-27 19:41 ` Dan Ritter
2015-04-29 19:59 ` Selim T. Erdoğan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150429195958.GA15167@side \
    --to=selim@alumni.cs.utexas.edu \
    --cc=debian-user@lists.debian.org \
    --cc=george@nsup.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.