linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Niel Lambrechts <niel.lambrechts@gmail.com>
To: Tejun Heo <tj@kernel.org>, "linux.kernel" <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.29 regression: ATA bus errors on resume
Date: Mon, 30 Mar 2009 16:30:32 +0200	[thread overview]
Message-ID: <49D0D788.6070405@gmail.com> (raw)
In-Reply-To: <cllvN-2Gf-1@gated-at.bofh.it>

On 03/30/2009 11:00 AM, Tejun Heo wrote:
> Hello,
> 
> For some reason, I can't find the original thread, so replying here.
> 
> Niel Lambrechts wrote:
>>>>>> The ext4 errors are interleaved with hardware errors, and the ext4
>>>>>> errors are about I/O errors.
>>>>>>
>>>>>> EXT4-fs error (device sda6): __ext4_get_inode_loc: unable to read inode block - inode=2346519
>>>>>> EXT4-fs error (device sda6) in ext4_reserve_inode_write: IO failure
>>>>>>
>>>>>> This looks more like a hibernation problem than an ext4 problem.
>>>>>> Looks like the hard drive is being left in some inconsistent state
>>>>>> after resuming from hibernation.
> 
> Yeap, ext4 is just the victim here.
> 
>>>>> ata1.00: irq_stat 0x00400008, PHY RDY changed
>>>>> ata1: SError: { PHYRdyChg CommWake }
>>>> Your SATA hardware flags a connect-or-disconnect event ("PHY RDY"), 
>>>> which requires us to abort a bunch of queued commands:
>>>>
>>>>> ata1.00: cmd 60/18:00:77:88:6f/00:00:0e:00:00/40 tag 0 ncq 12288 in
>>>>>          res 50/00:30:07:b3:10/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
>>>> [...]
> ...
>>>> The SCSI subsystem aborts each of the queued commands.
>>> No .. this is the SCSI subsystem receives an ABORTED COMMAND return in
>>> sense data for each of the outstanding I/Os
>>>
>>> The only place these are generated is in ata_sense_to_error() which only
>>> occurs if there's some type of ata error.
>>>
>>> If I had to theorise, I'd say the system suspended with commands
>>> outstanding to the device.  On resume, the device gets reset and returns
>>> some type of ATA error which gets translated to ABORTED COMMAND which
>>> causes a failure.
>>>
>>> In the mid layer, we translate ABORTED_COMMAND into a retry until the
>>> command runs out of them ... could it be there's a race readying the
>>> device and we run through the retries before it can accept the command?
> 
> When libata-eh thinks that the problem isn't worth retrying, it sets
> scmd->retries to scmd->allowed so that it gets aborted immediately.
> The code is in ata_eh_qc_complete().
> 
> Whether a command is to be retried or not is determined with
> ATA_QCFLAG_RETRY which is set in ata_eh_link_autopsy() for each failed
> command.  Immediate-failure criteria is pretty strict - only driver
> software errors (AC_ERR_INVALID) and PC or other special commands
> which failed which got aborted by the device get the immediate pink
> slip.  In this case, the commands are from FS and failed with
> AC_ERR_ATA_BUS, so it definitely doesn't fit into the criteria.
> Strange.
> 
> How reproducible is the problem?  Are you interested in trying out
> some debug patches?

Hi Tejun,

I think I should be able to reproduce when actively using X with 2.6.29,
and I have an external disk where I could backup to / boot from if the
corruption became a problem.

These issues are keeping me from 2.6.29 so I'll gladly help where I can,
if you can please provide me the patches and the .config settings that
may be required?

Niel

       reply	other threads:[~2009-03-30 14:30 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <ckpL0-3TE-3@gated-at.bofh.it>
     [not found] ` <ckpL0-3TE-5@gated-at.bofh.it>
     [not found]   ` <ckpL0-3TE-7@gated-at.bofh.it>
     [not found]     ` <ckpL0-3TE-9@gated-at.bofh.it>
     [not found]       ` <ckpL0-3TE-11@gated-at.bofh.it>
     [not found]         ` <ckpL0-3TE-1@gated-at.bofh.it>
     [not found]           ` <cllvN-2Gf-1@gated-at.bofh.it>
2009-03-30 14:30             ` Niel Lambrechts [this message]
2009-03-30 14:40               ` 2.6.29 regression: ATA bus errors on resume Jeff Garzik
2009-04-01 19:48                 ` Niel Lambrechts
2009-04-03 20:09                   ` Jeff Garzik
2009-04-03 20:54                     ` Niel Lambrechts
2009-04-02  1:50               ` Tejun Heo
2009-04-02  6:20                 ` Niel Lambrechts
2009-04-02  6:52                   ` Tejun Heo
2009-04-02 11:03                     ` Niel Lambrechts
2009-04-02 14:15                       ` Niel Lambrechts
2009-04-04  4:54                         ` Tejun Heo
2009-04-06  5:01                           ` Niel Lambrechts
2009-04-06 10:09                             ` Tejun Heo
2009-04-06 18:23                               ` Niel Lambrechts
2009-04-06 19:39                                 ` Tejun Heo
2009-04-06 21:26                                   ` Niel Lambrechts
2009-04-09 18:18                                     ` Tejun Heo
2009-05-23  9:17                                       ` Niel Lambrechts
2009-05-23 10:26                                         ` 2.6.29 regression: ATA bus errors on resume (output with debug patch) Niel Lambrechts
2009-05-25  0:32                                           ` Tejun Heo
     [not found] <clqON-2Xv-7@gated-at.bofh.it>
     [not found] ` <clqON-2Xv-9@gated-at.bofh.it>
     [not found]   ` <clqON-2Xv-11@gated-at.bofh.it>
     [not found]     ` <clqON-2Xv-13@gated-at.bofh.it>
     [not found]       ` <clqON-2Xv-15@gated-at.bofh.it>
     [not found]         ` <clqON-2Xv-17@gated-at.bofh.it>
     [not found]           ` <clqON-2Xv-19@gated-at.bofh.it>
     [not found]             ` <clqON-2Xv-5@gated-at.bofh.it>
     [not found]               ` <clqYt-3bu-5@gated-at.bofh.it>
2009-03-30 18:24                 ` 2.6.29 regression: ATA bus errors on resume Niel Lambrechts
2009-03-30 19:17                   ` Jeff Garzik
     [not found]               ` <cmknZ-8lW-9@gated-at.bofh.it>
     [not found]                 ` <cmoBl-6Ok-21@gated-at.bofh.it>
     [not found]                   ` <cmp4n-7rb-15@gated-at.bofh.it>
     [not found]                     ` <cmsYg-5BR-27@gated-at.bofh.it>
     [not found]                       ` <cmvW7-1Yj-23@gated-at.bofh.it>
     [not found]                         ` <cnheh-3vO-7@gated-at.bofh.it>
     [not found]                           ` <cnPg1-7Q4-19@gated-at.bofh.it>
     [not found]                             ` <cnTWo-7bV-25@gated-at.bofh.it>
     [not found]                               ` <co1Kd-350-5@gated-at.bofh.it>
     [not found]                                 ` <co2Qf-4QQ-27@gated-at.bofh.it>
     [not found]                                   ` <co4yj-7Mc-5@gated-at.bofh.it>
     [not found]                                     ` <cp71c-4py-29@gated-at.bofh.it>
     [not found]                                       ` <cEVyE-re-1@gated-at.bofh.it>
2009-05-23  9:36                                         ` Niel Lambrechts
2009-05-25  1:10                                           ` Tejun Heo
2009-05-25  8:15                                             ` Alan Cox
2009-05-25 22:06                                               ` Niel Lambrechts
2009-05-26  4:58                                                 ` Tejun Heo
2009-05-26  5:43                                                   ` Niel Lambrechts
2009-05-26  5:50                                                     ` Tejun Heo
2009-05-26  6:13                                                       ` Niel Lambrechts
2009-05-26 13:33                                                         ` Tejun Heo
2009-05-26 18:14                                                           ` Niel Lambrechts
2009-05-27  0:07                                                             ` Tejun Heo
2009-05-27 14:01                                                               ` Niel Lambrechts
2009-06-01 18:57                                                                 ` Niel Lambrechts
2009-06-03  3:14                                                                   ` Tejun Heo
2009-06-03  4:28                                                                     ` Tejun Heo
2009-06-06  7:05                                                                       ` Niel Lambrechts
2009-06-19 15:04                                                                         ` Pavel Machek
2009-06-25 12:57                                                                         ` Tejun Heo
2009-06-25 15:25                                                                           ` Niel Lambrechts
2009-06-26  0:46                                                                             ` Tejun Heo
2009-06-26  6:24                                                                               ` Niel Lambrechts
2009-09-18 20:26                                                                                 ` Berthold Gunreben
2009-09-25  4:11                                                                                   ` Tejun Heo
2009-09-30  9:58                                                                                     ` Berthold Gunreben
2009-09-30 10:26                                                                                       ` Tejun Heo
2009-05-26  4:58                                               ` Tejun Heo
     [not found] <cjtH6-3Ll-13@gated-at.bofh.it>
     [not found] ` <cjtH6-3Ll-15@gated-at.bofh.it>
     [not found]   ` <cjtH6-3Ll-11@gated-at.bofh.it>
     [not found]     ` <cjutt-577-11@gated-at.bofh.it>
     [not found]       ` <cjJCb-47c-23@gated-at.bofh.it>
2009-03-27 19:10         ` Niel Lambrechts
2009-03-27 22:30           ` Arjan van de Ven
2009-03-28 10:22             ` Niel Lambrechts
2009-03-28 14:06               ` Rafael J. Wysocki
2009-03-30  8:43                 ` Tejun Heo
2009-03-30  8:55           ` Tejun Heo
     [not found] <cjlqb-7sp-1@gated-at.bofh.it>
     [not found] ` <cjq6y-6sq-11@gated-at.bofh.it>
2009-03-25  5:19   ` 2.6.29 regression: ATA bus errors on resume (was: EXT4: __ext4_get_inode_loc errors after s2disk) Niel Lambrechts
2009-03-25  6:06     ` 2.6.29 regression: ATA bus errors on resume Jeff Garzik
2009-03-25 21:40       ` Niel Lambrechts
2009-03-25 22:16       ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49D0D788.6070405@gmail.com \
    --to=niel.lambrechts@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).