linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Jeff Garzik <jeff@garzik.org>
Cc: Niel Lambrechts <niel.lambrechts@gmail.com>,
	"linux.kernel" <linux-kernel@vger.kernel.org>,
	Linux IDE mailing list <linux-ide@vger.kernel.org>,
	Arjan van de Ven <arjan@infradead.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: 2.6.29 regression: ATA bus errors on resume
Date: Wed, 25 Mar 2009 17:16:18 -0500	[thread overview]
Message-ID: <1238019378.3281.41.camel@localhost.localdomain> (raw)
In-Reply-To: <49C9CA03.7040603@garzik.org>

On Wed, 2009-03-25 at 02:06 -0400, Jeff Garzik wrote:
> Niel Lambrechts wrote:
> > On 03/25/2009 03:30 AM, Theodore Tso wrote:
> >> On Tue, Mar 24, 2009 at 10:25:57PM +0200, Niel Lambrechts wrote:
> >>> Hi,
> >>>
> >>> After upgrading to 2.6.29 I get the below errors after resuming from
> >>> hibernating with s2disk. I ran fsck and tried doing the same thing again
> >>> in 2.6.28.9-pae, but do not get any errors there.
> >> The ext4 errors are interleaved with hardware errors, and the ext4
> >> errors are about I/O errors.
> >>
> >> EXT4-fs error (device sda6): __ext4_get_inode_loc: unable to read inode block - inode=2346519
> >> EXT4-fs error (device sda6) in ext4_reserve_inode_write: IO failure
> >>
> >> This looks more like a hibernation problem than an ext4 problem.
> >> Looks like the hard drive is being left in some inconsistent state
> >> after resuming from hibernation.
> >>
> >>      	   	       		   	   - Ted
> > 
> > Thanks for the info Theodore, this is definitely looks like some type of
> > regression in 2.6.29, as the problem is not evident when I s2disk using
> > 2.6.28.9, even after multiple suspend/resume cycles.
> > 
> > I found some 'ATA bus errors' and 'SError' messages in
> > /var/log/messages, so I've attached the messages from both 2.6.29 and
> > 2.6.28 for comparison.
> 
> Well, here is the interpretation of messages:
> 
> > ata1.00: irq_stat 0x00400008, PHY RDY changed
> > ata1: SError: { PHYRdyChg CommWake }
> 
> Your SATA hardware flags a connect-or-disconnect event ("PHY RDY"), 
> which requires us to abort a bunch of queued commands:
> 
> > ata1.00: cmd 60/18:00:77:88:6f/00:00:0e:00:00/40 tag 0 ncq 12288 in
> >          res 50/00:30:07:b3:10/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
> [...]
> > ata1.00: cmd 60/30:68:07:b3:10/00:00:0c:00:00/40 tag 13 ncq 24576 in
> >          res 50/00:30:07:b3:10/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
> 
>   ...through the 14th command (tag 13).
> 
> > Mar 24 21:29:14 linux-7vph kernel: ata1: hard resetting link
> > Mar 24 21:29:14 linux-7vph kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: configured for UDMA/133
> > Mar 24 21:29:14 linux-7vph kernel: ata1.00: configured for UDMA/133
> 
> 
> SATA link is reset, and ACPI is re-run.
> 
> > Mar 24 21:29:14 linux-7vph kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> > Mar 24 21:29:14 linux-7vph kernel: sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor]
> > Mar 24 21:29:14 linux-7vph kernel: Descriptor sense data with sense descriptors (in hex):
> > Mar 24 21:29:14 linux-7vph kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
> > Mar 24 21:29:14 linux-7vph kernel:         0c 10 b3 07 
> > Mar 24 21:29:14 linux-7vph kernel: sd 0:0:0:0: [sda] Add. Sense: No additional sense information
> > Mar 24 21:29:14 linux-7vph kernel: end_request: I/O error, dev sda, sector 242190455
> 
> The SCSI subsystem aborts each of the queued commands.

No .. this is the SCSI subsystem receives an ABORTED COMMAND return in
sense data for each of the outstanding I/Os

The only place these are generated is in ata_sense_to_error() which only
occurs if there's some type of ata error.

If I had to theorise, I'd say the system suspended with commands
outstanding to the device.  On resume, the device gets reset and returns
some type of ATA error which gets translated to ABORTED COMMAND which
causes a failure.

In the mid layer, we translate ABORTED_COMMAND into a retry until the
command runs out of them ... could it be there's a race readying the
device and we run through the retries before it can accept the command?

James



  parent reply	other threads:[~2009-03-25 22:16 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cjlqb-7sp-1@gated-at.bofh.it>
     [not found] ` <cjq6y-6sq-11@gated-at.bofh.it>
2009-03-25  5:19   ` 2.6.29 regression: ATA bus errors on resume (was: EXT4: __ext4_get_inode_loc errors after s2disk) Niel Lambrechts
2009-03-25  6:06     ` 2.6.29 regression: ATA bus errors on resume Jeff Garzik
2009-03-25 21:40       ` Niel Lambrechts
2009-03-25 22:16       ` James Bottomley [this message]
     [not found] ` <cjJLV-4jA-15@gated-at.bofh.it>
2009-03-25 22:43   ` 2.6.29: EXT4: __ext4_get_inode_loc errors after s2disk Niel Lambrechts
     [not found] <cjtH6-3Ll-13@gated-at.bofh.it>
     [not found] ` <cjtH6-3Ll-15@gated-at.bofh.it>
     [not found]   ` <cjtH6-3Ll-11@gated-at.bofh.it>
     [not found]     ` <cjutt-577-11@gated-at.bofh.it>
     [not found]       ` <cjJCb-47c-23@gated-at.bofh.it>
2009-03-27 19:10         ` 2.6.29 regression: ATA bus errors on resume Niel Lambrechts
2009-03-27 22:30           ` Arjan van de Ven
2009-03-28 10:22             ` Niel Lambrechts
2009-03-28 14:06               ` Rafael J. Wysocki
2009-03-30  8:43                 ` Tejun Heo
2009-03-30  8:55           ` Tejun Heo
     [not found] <ckpL0-3TE-3@gated-at.bofh.it>
     [not found] ` <ckpL0-3TE-5@gated-at.bofh.it>
     [not found]   ` <ckpL0-3TE-7@gated-at.bofh.it>
     [not found]     ` <ckpL0-3TE-9@gated-at.bofh.it>
     [not found]       ` <ckpL0-3TE-11@gated-at.bofh.it>
     [not found]         ` <ckpL0-3TE-1@gated-at.bofh.it>
     [not found]           ` <cllvN-2Gf-1@gated-at.bofh.it>
2009-03-30 14:30             ` Niel Lambrechts
2009-03-30 14:40               ` Jeff Garzik
2009-04-01 19:48                 ` Niel Lambrechts
2009-04-03 20:09                   ` Jeff Garzik
2009-04-03 20:54                     ` Niel Lambrechts
2009-04-02  1:50               ` Tejun Heo
2009-04-02  6:20                 ` Niel Lambrechts
2009-04-02  6:52                   ` Tejun Heo
2009-04-02 11:03                     ` Niel Lambrechts
2009-04-02 14:15                       ` Niel Lambrechts
2009-04-04  4:54                         ` Tejun Heo
2009-04-06  5:01                           ` Niel Lambrechts
2009-04-06 10:09                             ` Tejun Heo
2009-04-06 18:23                               ` Niel Lambrechts
2009-04-06 19:39                                 ` Tejun Heo
2009-04-06 21:26                                   ` Niel Lambrechts
2009-04-09 18:18                                     ` Tejun Heo
2009-05-23  9:17                                       ` Niel Lambrechts
     [not found] <clqON-2Xv-7@gated-at.bofh.it>
     [not found] ` <clqON-2Xv-9@gated-at.bofh.it>
     [not found]   ` <clqON-2Xv-11@gated-at.bofh.it>
     [not found]     ` <clqON-2Xv-13@gated-at.bofh.it>
     [not found]       ` <clqON-2Xv-15@gated-at.bofh.it>
     [not found]         ` <clqON-2Xv-17@gated-at.bofh.it>
     [not found]           ` <clqON-2Xv-19@gated-at.bofh.it>
     [not found]             ` <clqON-2Xv-5@gated-at.bofh.it>
     [not found]               ` <clqYt-3bu-5@gated-at.bofh.it>
2009-03-30 18:24                 ` Niel Lambrechts
2009-03-30 19:17                   ` Jeff Garzik
     [not found]               ` <cmknZ-8lW-9@gated-at.bofh.it>
     [not found]                 ` <cmoBl-6Ok-21@gated-at.bofh.it>
     [not found]                   ` <cmp4n-7rb-15@gated-at.bofh.it>
     [not found]                     ` <cmsYg-5BR-27@gated-at.bofh.it>
     [not found]                       ` <cmvW7-1Yj-23@gated-at.bofh.it>
     [not found]                         ` <cnheh-3vO-7@gated-at.bofh.it>
     [not found]                           ` <cnPg1-7Q4-19@gated-at.bofh.it>
     [not found]                             ` <cnTWo-7bV-25@gated-at.bofh.it>
     [not found]                               ` <co1Kd-350-5@gated-at.bofh.it>
     [not found]                                 ` <co2Qf-4QQ-27@gated-at.bofh.it>
     [not found]                                   ` <co4yj-7Mc-5@gated-at.bofh.it>
     [not found]                                     ` <cp71c-4py-29@gated-at.bofh.it>
     [not found]                                       ` <cEVyE-re-1@gated-at.bofh.it>
2009-05-23  9:36                                         ` Niel Lambrechts
2009-05-25  1:10                                           ` Tejun Heo
2009-05-25  8:15                                             ` Alan Cox
2009-05-25 22:06                                               ` Niel Lambrechts
2009-05-26  4:58                                                 ` Tejun Heo
2009-05-26  5:43                                                   ` Niel Lambrechts
2009-05-26  5:50                                                     ` Tejun Heo
2009-05-26  6:13                                                       ` Niel Lambrechts
2009-05-26 13:33                                                         ` Tejun Heo
2009-05-26 18:14                                                           ` Niel Lambrechts
2009-05-27  0:07                                                             ` Tejun Heo
2009-05-27 14:01                                                               ` Niel Lambrechts
2009-06-01 18:57                                                                 ` Niel Lambrechts
2009-06-03  3:14                                                                   ` Tejun Heo
2009-06-03  4:28                                                                     ` Tejun Heo
2009-06-06  7:05                                                                       ` Niel Lambrechts
2009-06-19 15:04                                                                         ` Pavel Machek
2009-06-25 12:57                                                                         ` Tejun Heo
2009-06-25 15:25                                                                           ` Niel Lambrechts
2009-06-26  0:46                                                                             ` Tejun Heo
2009-06-26  6:24                                                                               ` Niel Lambrechts
2009-09-18 20:26                                                                                 ` Berthold Gunreben
2009-09-25  4:11                                                                                   ` Tejun Heo
2009-09-30  9:58                                                                                     ` Berthold Gunreben
2009-09-30 10:26                                                                                       ` Tejun Heo
2009-05-26  4:58                                               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1238019378.3281.41.camel@localhost.localdomain \
    --to=james.bottomley@hansenpartnership.com \
    --cc=arjan@infradead.org \
    --cc=jeff@garzik.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=niel.lambrechts@gmail.com \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).