From: Niel Lambrechts <niel.lambrechts@gmail.com>
To: Jeff Garzik <jeff@garzik.org>,
"linux.kernel" <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.29 regression: ATA bus errors on resume
Date: Mon, 30 Mar 2009 20:24:17 +0200 [thread overview]
Message-ID: <49D10E51.8000104@devnull.org> (raw)
In-Reply-To: <clqYt-3bu-5@gated-at.bofh.it>
On 03/30/2009 04:50 PM, Jeff Garzik wrote:
> Niel Lambrechts wrote:
>> On 03/30/2009 11:00 AM, Tejun Heo wrote:
>>> Hello,
>>>
>>> For some reason, I can't find the original thread, so replying here.
>>>
>>> Niel Lambrechts wrote:
>>>>>>>> The ext4 errors are interleaved with hardware errors, and the ext4
>>>>>>>> errors are about I/O errors.
>>>>>>>>
>>>>>>>> EXT4-fs error (device sda6): __ext4_get_inode_loc: unable to
>>>>>>>> read inode block - inode=2346519
>>>>>>>> EXT4-fs error (device sda6) in ext4_reserve_inode_write: IO failure
>>>>>>>>
>>>>>>>> This looks more like a hibernation problem than an ext4 problem.
>>>>>>>> Looks like the hard drive is being left in some inconsistent state
>>>>>>>> after resuming from hibernation.
>>> Yeap, ext4 is just the victim here.
>>>
>>>>>>> ata1.00: irq_stat 0x00400008, PHY RDY changed
>>>>>>> ata1: SError: { PHYRdyChg CommWake }
>>>>>> Your SATA hardware flags a connect-or-disconnect event ("PHY
>>>>>> RDY"), which requires us to abort a bunch of queued commands:
>>>>>>
>>>>>>> ata1.00: cmd 60/18:00:77:88:6f/00:00:0e:00:00/40 tag 0 ncq 12288 in
>>>>>>> res 50/00:30:07:b3:10/00:00:0c:00:00/40 Emask 0x10 (ATA
>>>>>>> bus error)
>>>>>> [...]
>>> ...
>>>>>> The SCSI subsystem aborts each of the queued commands.
>>>>> No .. this is the SCSI subsystem receives an ABORTED COMMAND return in
>>>>> sense data for each of the outstanding I/Os
>>>>>
>>>>> The only place these are generated is in ata_sense_to_error() which
>>>>> only
>>>>> occurs if there's some type of ata error.
>>>>>
>>>>> If I had to theorise, I'd say the system suspended with commands
>>>>> outstanding to the device. On resume, the device gets reset and
>>>>> returns
>>>>> some type of ATA error which gets translated to ABORTED COMMAND which
>>>>> causes a failure.
>>>>>
>>>>> In the mid layer, we translate ABORTED_COMMAND into a retry until the
>>>>> command runs out of them ... could it be there's a race readying the
>>>>> device and we run through the retries before it can accept the
>>>>> command?
>>> When libata-eh thinks that the problem isn't worth retrying, it sets
>>> scmd->retries to scmd->allowed so that it gets aborted immediately.
>>> The code is in ata_eh_qc_complete().
>>>
>>> Whether a command is to be retried or not is determined with
>>> ATA_QCFLAG_RETRY which is set in ata_eh_link_autopsy() for each failed
>>> command. Immediate-failure criteria is pretty strict - only driver
>>> software errors (AC_ERR_INVALID) and PC or other special commands
>>> which failed which got aborted by the device get the immediate pink
>>> slip. In this case, the commands are from FS and failed with
>>> AC_ERR_ATA_BUS, so it definitely doesn't fit into the criteria.
>>> Strange.
>>>
>>> How reproducible is the problem? Are you interested in trying out
>>> some debug patches?
>>
>> Hi Tejun,
>>
>> I think I should be able to reproduce when actively using X with 2.6.29,
>> and I have an external disk where I could backup to / boot from if the
>> corruption became a problem.
>>
>> These issues are keeping me from 2.6.29 so I'll gladly help where I can,
>> if you can please provide me the patches and the .config settings that
>> may be required?
>>
>> Niel
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
> Any chance you could use bisect to narrow down the problem commit?
>
> http://kernel.org/pub/software/scm/git/docs/v1.4.4.4/howto/isolate-bugs-with-bisect.txt
>
>
> This should identify which patch caused your problems, if you have a
> known good starting point (such as 2.6.28).
>
> Jeff
Any idea of the volume of data would I need to download, git repository
wise? I currently only have the 2.6.27 source and patches on top... and
bandwidth is quite expensive in SA...
Niel
next parent reply other threads:[~2009-03-30 18:30 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <clqON-2Xv-7@gated-at.bofh.it>
[not found] ` <clqON-2Xv-9@gated-at.bofh.it>
[not found] ` <clqON-2Xv-11@gated-at.bofh.it>
[not found] ` <clqON-2Xv-13@gated-at.bofh.it>
[not found] ` <clqON-2Xv-15@gated-at.bofh.it>
[not found] ` <clqON-2Xv-17@gated-at.bofh.it>
[not found] ` <clqON-2Xv-19@gated-at.bofh.it>
[not found] ` <clqON-2Xv-5@gated-at.bofh.it>
[not found] ` <clqYt-3bu-5@gated-at.bofh.it>
2009-03-30 18:24 ` Niel Lambrechts [this message]
2009-03-30 19:17 ` 2.6.29 regression: ATA bus errors on resume Jeff Garzik
[not found] ` <cmknZ-8lW-9@gated-at.bofh.it>
[not found] ` <cmoBl-6Ok-21@gated-at.bofh.it>
[not found] ` <cmp4n-7rb-15@gated-at.bofh.it>
[not found] ` <cmsYg-5BR-27@gated-at.bofh.it>
[not found] ` <cmvW7-1Yj-23@gated-at.bofh.it>
[not found] ` <cnheh-3vO-7@gated-at.bofh.it>
[not found] ` <cnPg1-7Q4-19@gated-at.bofh.it>
[not found] ` <cnTWo-7bV-25@gated-at.bofh.it>
[not found] ` <co1Kd-350-5@gated-at.bofh.it>
[not found] ` <co2Qf-4QQ-27@gated-at.bofh.it>
[not found] ` <co4yj-7Mc-5@gated-at.bofh.it>
[not found] ` <cp71c-4py-29@gated-at.bofh.it>
[not found] ` <cEVyE-re-1@gated-at.bofh.it>
2009-05-23 9:36 ` Niel Lambrechts
2009-05-25 1:10 ` Tejun Heo
2009-05-25 8:15 ` Alan Cox
2009-05-25 22:06 ` Niel Lambrechts
2009-05-26 4:58 ` Tejun Heo
2009-05-26 5:43 ` Niel Lambrechts
2009-05-26 5:50 ` Tejun Heo
2009-05-26 6:13 ` Niel Lambrechts
2009-05-26 13:33 ` Tejun Heo
2009-05-26 18:14 ` Niel Lambrechts
2009-05-27 0:07 ` Tejun Heo
2009-05-27 14:01 ` Niel Lambrechts
2009-06-01 18:57 ` Niel Lambrechts
2009-06-03 3:14 ` Tejun Heo
2009-06-03 4:28 ` Tejun Heo
2009-06-06 7:05 ` Niel Lambrechts
2009-06-19 15:04 ` Pavel Machek
2009-06-25 12:57 ` Tejun Heo
2009-06-25 15:25 ` Niel Lambrechts
2009-06-26 0:46 ` Tejun Heo
2009-06-26 6:24 ` Niel Lambrechts
2009-09-18 20:26 ` Berthold Gunreben
2009-09-25 4:11 ` Tejun Heo
2009-09-30 9:58 ` Berthold Gunreben
2009-09-30 10:26 ` Tejun Heo
2009-05-26 4:58 ` Tejun Heo
[not found] <ckpL0-3TE-3@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-5@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-7@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-9@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-11@gated-at.bofh.it>
[not found] ` <ckpL0-3TE-1@gated-at.bofh.it>
[not found] ` <cllvN-2Gf-1@gated-at.bofh.it>
2009-03-30 14:30 ` Niel Lambrechts
2009-03-30 14:40 ` Jeff Garzik
2009-04-01 19:48 ` Niel Lambrechts
2009-04-03 20:09 ` Jeff Garzik
2009-04-03 20:54 ` Niel Lambrechts
2009-04-02 1:50 ` Tejun Heo
2009-04-02 6:20 ` Niel Lambrechts
2009-04-02 6:52 ` Tejun Heo
2009-04-02 11:03 ` Niel Lambrechts
2009-04-02 14:15 ` Niel Lambrechts
2009-04-04 4:54 ` Tejun Heo
2009-04-06 5:01 ` Niel Lambrechts
2009-04-06 10:09 ` Tejun Heo
2009-04-06 18:23 ` Niel Lambrechts
2009-04-06 19:39 ` Tejun Heo
2009-04-06 21:26 ` Niel Lambrechts
2009-04-09 18:18 ` Tejun Heo
2009-05-23 9:17 ` Niel Lambrechts
[not found] <cjtH6-3Ll-13@gated-at.bofh.it>
[not found] ` <cjtH6-3Ll-15@gated-at.bofh.it>
[not found] ` <cjtH6-3Ll-11@gated-at.bofh.it>
[not found] ` <cjutt-577-11@gated-at.bofh.it>
[not found] ` <cjJCb-47c-23@gated-at.bofh.it>
2009-03-27 19:10 ` Niel Lambrechts
2009-03-27 22:30 ` Arjan van de Ven
2009-03-28 10:22 ` Niel Lambrechts
2009-03-28 14:06 ` Rafael J. Wysocki
2009-03-30 8:43 ` Tejun Heo
2009-03-30 8:55 ` Tejun Heo
[not found] <cjlqb-7sp-1@gated-at.bofh.it>
[not found] ` <cjq6y-6sq-11@gated-at.bofh.it>
2009-03-25 5:19 ` 2.6.29 regression: ATA bus errors on resume (was: EXT4: __ext4_get_inode_loc errors after s2disk) Niel Lambrechts
2009-03-25 6:06 ` 2.6.29 regression: ATA bus errors on resume Jeff Garzik
2009-03-25 21:40 ` Niel Lambrechts
2009-03-25 22:16 ` James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49D10E51.8000104@devnull.org \
--to=niel.lambrechts@gmail.com \
--cc=jeff@garzik.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).