public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* What are these ATA exceptions trying to tell me? [2.6.26] System Events]
@ 2009-10-26  8:23 martin f krafft
  2009-10-27  0:28 ` Robert Hancock
  0 siblings, 1 reply; 3+ messages in thread
From: martin f krafft @ 2009-10-26  8:23 UTC (permalink / raw)
  To: linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

Dear folks,

I have a quality and high performance, new rack-mounted system, but
every now and then, the kernel spews a slew of messages like the
following to syslog:

  ata3: EH in SWNCQ mode,QC:qc_active 0x7FFF sactive 0x7FFF
  ata3: SWNCQ:qc_active 0x1 defer_bits 0x7FFE last_issue_tag 0x0
  dhfis 0x1 dmafis 0x0 sdbfis 0x0
  ata3: ATA_REG 0x40 ERR_REG 0x0
  ata3: tag : dhfis dmafis sdbfis sacitve
  ata3: tag 0x0: 1 0 0 1
  ata3.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x6 frozen
  ata3.00: cmd 61/18:00:4f:22:b3/00:00:05:00:00/40 tag 0 ncq 12288 out
          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
  ata3.00: status: { DRDY }
  [...]
  ata3: hard resetting link
  ata3: SRST failed (errno=-19)
  ata3: SATA link down (SStatus 0 SControl 300)
  ata3: failed to recover some devices, retrying in 5 secs
  ata3: hard resetting link
  ata3: link is slow to respond, please be patient (ready=-19)
  ata3: SRST failed (errno=-16)
  ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata3.00: configured for UDMA/133
  ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
  ata3: hot plug
  ata3.00: configured for UDMA/133
  sd 2:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
  sd 2:0:0:0: [sdc] Write Protect is off
  sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
  sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  sd 2:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
  sd 2:0:0:0: [sdc] Write Protect is off
  sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
  sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

This happens for ata[34], but never for ata[12].

At the time of these messages, the the machine was not loaded. In
particular, there was no SMART self-test running. I mention this
because i have set smartd to short tests daily and extended tests
weekly, and those only rarely complete:

  # 1  Extended offline    Interrupted (host reset)      00%      4812         -
  # 2  Short offline       Interrupted (host reset)      00%      4684         -

while they run fine for ata[12].

The chipset is nVidia MCP55. Am I dealing with a broken controller?

Cheers,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"i like young girls. their stories are shorter."
                                                        -- tom mcguane
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: What are these ATA exceptions trying to tell me? [2.6.26] System Events]
  2009-10-26  8:23 What are these ATA exceptions trying to tell me? [2.6.26] System Events] martin f krafft
@ 2009-10-27  0:28 ` Robert Hancock
  2009-10-27  6:42   ` martin f krafft
  0 siblings, 1 reply; 3+ messages in thread
From: Robert Hancock @ 2009-10-27  0:28 UTC (permalink / raw)
  To: linux kernel mailing list

On 10/26/2009 02:23 AM, martin f krafft wrote:
> Dear folks,
>
> I have a quality and high performance, new rack-mounted system, but
> every now and then, the kernel spews a slew of messages like the
> following to syslog:
>
>    ata3: EH in SWNCQ mode,QC:qc_active 0x7FFF sactive 0x7FFF
>    ata3: SWNCQ:qc_active 0x1 defer_bits 0x7FFE last_issue_tag 0x0
>    dhfis 0x1 dmafis 0x0 sdbfis 0x0
>    ata3: ATA_REG 0x40 ERR_REG 0x0
>    ata3: tag : dhfis dmafis sdbfis sacitve
>    ata3: tag 0x0: 1 0 0 1
>    ata3.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x6 frozen
>    ata3.00: cmd 61/18:00:4f:22:b3/00:00:05:00:00/40 tag 0 ncq 12288 out
>            res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
>    ata3.00: status: { DRDY }
>    [...]
>    ata3: hard resetting link
>    ata3: SRST failed (errno=-19)
>    ata3: SATA link down (SStatus 0 SControl 300)
>    ata3: failed to recover some devices, retrying in 5 secs
>    ata3: hard resetting link
>    ata3: link is slow to respond, please be patient (ready=-19)
>    ata3: SRST failed (errno=-16)
>    ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>    ata3.00: configured for UDMA/133
>    ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
>    ata3: hot plug
>    ata3.00: configured for UDMA/133
>    sd 2:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
>    sd 2:0:0:0: [sdc] Write Protect is off
>    sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
>    sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>    sd 2:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
>    sd 2:0:0:0: [sdc] Write Protect is off
>    sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
>    sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>
> This happens for ata[34], but never for ata[12].

The "SATA link down" part is really quite abnormal, it seems like the 
drive dropped off the SATA link. Rather suspicious of some kind of 
hardware problem..

Are the two sets of disks the same model?

>
> At the time of these messages, the the machine was not loaded. In
> particular, there was no SMART self-test running. I mention this
> because i have set smartd to short tests daily and extended tests
> weekly, and those only rarely complete:
>
>    # 1  Extended offline    Interrupted (host reset)      00%      4812         -
>    # 2  Short offline       Interrupted (host reset)      00%      4684         -
>
> while they run fine for ata[12].
>
> The chipset is nVidia MCP55. Am I dealing with a broken controller?
>
> Cheers,
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: What are these ATA exceptions trying to tell me? [2.6.26] System Events]
  2009-10-27  0:28 ` Robert Hancock
@ 2009-10-27  6:42   ` martin f krafft
  0 siblings, 0 replies; 3+ messages in thread
From: martin f krafft @ 2009-10-27  6:42 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

also sprach Robert Hancock <hancockrwd@gmail.com> [2009.10.27.0128 +0100]:
> The "SATA link down" part is really quite abnormal, it seems like the
> drive dropped off the SATA link. Rather suspicious of some kind of
> hardware problem..
> 
> Are the two sets of disks the same model?

Yes. All four disks are. And there are 4 identical MCP55 SATA ports
too. Either those two disks are broken by chance, or the controller
is broken.

Cheers,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"a mathematician is a device for turning coffee into theorems."
                                                         -- paul erdös
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-10-27  6:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-26  8:23 What are these ATA exceptions trying to tell me? [2.6.26] System Events] martin f krafft
2009-10-27  0:28 ` Robert Hancock
2009-10-27  6:42   ` martin f krafft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox