From: Tejun Heo <htejun@gmail.com>
To: Mark Lord <liml@rtr.ca>
Cc: Linus Torvalds <torvalds@osdl.org>, Jeff Garzik <jeff@garzik.org>,
Andrew Morton <akpm@osdl.org>,
linux-ide@vger.kernel.org
Subject: Re: Scary Intel SATA problem: "frozen"
Date: Wed, 29 Nov 2006 10:12:25 +0900 [thread overview]
Message-ID: <456CDE79.10709@gmail.com> (raw)
In-Reply-To: <456C73D8.3050803@rtr.ca>
Mark Lord wrote:
> Linus Torvalds wrote:
>> [ You may or may not have gotten my previous email. The kernel stayed
>> working, but due to the IO errors the filesystem got re-mounted
>> read-only, and I'm not sure that the email I sent out in that state
>> actually ever made it out. I suspect it didn't. ]
>>
>> Jeff,
>> I just had a scary thing on my nice new Intel i965 box (all Intel
>> chipsets apart from some strange Marvell IDE interface that I'm not
>> using and that no driver even detected, and a TI firewire thing that
>> I'm similarly not using).
>>
>> The machine basically froze for about a minute or so (well, things
>> worked surprisingly well, considering that apparently no disk IO
>> happened - I initially thought it was just firefox that had frozen up,
>> since my mail session seemed to be fine), and after it came back the
>> filesystem was mounted read-only and nothing really worked any more..
>>
>> I have no idea what status 0xD0 means: it looks like ATA_BUSY +
>> ATA_DRDY + "bit#4", but what is bit#4?
>
> Bit #4, when actually implemented, is a rotational seek indicator,
> which can be used for timing purposes.
>
> But when BUSY (bit #7) is set, the rest are generally nonsense.
>
>> And clearly, the soft-reset isn't doing squat.
I dunno. My first suspect is transient transmission error and yeah they
do occur from time to time even on otherwise stable setup. For example,
my machine is nvidia ck804 which has pretty weak error handling (at
least used to) and stays up 24/7 and I've seen such unrecovered
transmission error just once during last 6+ months.
My experience is that if something is weird (say, power fluctuation or
electro-magnetic interference), SATA is the first thing to give out and
that's why we need good EH w/ SATA much more than we do with PATA.
Drives (controllers too) sometimes fall into weird state after such
errors and softreset is often not enough, so we need hardreset. ICH8
can do hardreset even in ata_piix mode. I'll work on it.
Linus, I'll follow up with Jonas as his problem seems reproducible but
I'm a bit skeptical about it being a driver issue. Even w/ all its
kinks, ata_piix is just a sff IDE controller and libata has been doing
it for a long time. I would be really surprised if the driver or
controller has any such issue in the usual r/w path. AHCI should be
able to recover from most error conditions unless drive firmware is
completely stuck requiring physical power off.
--
tejun
next prev parent reply other threads:[~2006-11-29 1:12 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-14 15:04 [git patches] libata fixes Jeff Garzik
2006-11-14 16:32 ` Mark Lord
2006-11-14 16:41 ` Jeff Garzik
2006-11-14 18:11 ` Mark Lord
2006-11-28 16:56 ` Scary Intel SATA errors Linus Torvalds
2006-11-29 18:25 ` Mark Lord
2006-11-29 18:42 ` Alan
2006-12-01 19:42 ` Alan
2006-11-28 17:31 ` Scary Intel SATA problem: "frozen" Linus Torvalds
2006-11-28 17:37 ` Mark Lord
2006-11-28 17:55 ` Sergei Shtylyov
2006-11-28 20:12 ` Eric D. Mudama
2006-11-28 20:36 ` Sergei Shtylyov
2006-11-29 1:12 ` Tejun Heo [this message]
2006-11-28 18:05 ` Alan
2006-11-28 18:33 ` Linus Torvalds
2006-11-28 21:03 ` Jeff Garzik
2006-11-28 21:45 ` Linus Torvalds
2006-11-28 22:18 ` Jeff Garzik
-- strict thread matches above, loose matches on Subject: below --
2006-11-28 22:24 Jonas Lundgren
2006-11-28 22:59 ` Linus Torvalds
2006-11-28 23:22 ` Jeff Garzik
2006-11-28 23:43 ` Linus Torvalds
2006-11-29 0:38 ` Jeff Garzik
2006-11-29 0:51 ` Linus Torvalds
2006-11-29 2:51 ` Mark Lord
2006-11-29 0:57 ` Tejun Heo
2006-11-29 7:14 ` Jonas Lundgren
2006-11-29 7:29 ` Tejun Heo
2006-11-29 14:11 ` Mark Lord
2006-11-29 16:19 ` Linus Torvalds
2006-12-06 17:58 ` Jonas Lundgren
2006-12-06 18:45 ` Andrew Lyon
2006-12-07 1:25 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=456CDE79.10709@gmail.com \
--to=htejun@gmail.com \
--cc=akpm@osdl.org \
--cc=jeff@garzik.org \
--cc=liml@rtr.ca \
--cc=linux-ide@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).