Re: Scary Intel SATA problem: "frozen"

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tejun Heo <htejun@gmail.com>
To: Mark Lord <liml@rtr.ca>
Cc: Linus Torvalds <torvalds@osdl.org>, Jeff Garzik <jeff@garzik.org>,
	Andrew Morton <akpm@osdl.org>,
	linux-ide@vger.kernel.org
Subject: Re: Scary Intel SATA problem: "frozen"
Date: Wed, 29 Nov 2006 10:12:25 +0900	[thread overview]
Message-ID: <456CDE79.10709@gmail.com> (raw)
In-Reply-To: <456C73D8.3050803@rtr.ca>

Mark Lord wrote:
> Linus Torvalds wrote:
>> [ You may or may not have gotten my previous email. The kernel stayed 
>>   working, but due to the IO errors the filesystem got re-mounted   
>> read-only, and I'm not sure that the email I sent out in that state   
>> actually ever made it out. I suspect it didn't. ]
>>
>> Jeff,
>>  I just had a scary thing on my nice new Intel i965 box (all Intel 
>> chipsets apart from some strange Marvell IDE interface that I'm not 
>> using and that no driver even detected, and a TI firewire thing that 
>> I'm similarly not using).
>>
>> The machine basically froze for about a minute or so (well, things 
>> worked surprisingly well, considering that apparently no disk IO 
>> happened - I initially thought it was just firefox that had frozen up, 
>> since my mail session seemed to be fine), and after it came back the 
>> filesystem was mounted read-only and nothing really worked any more..
>>
>> I have no idea what status 0xD0 means: it looks like ATA_BUSY + 
>> ATA_DRDY + "bit#4", but what is bit#4?
> 
> Bit #4, when actually implemented, is a rotational seek indicator,
> which can be used for timing purposes.
> 
> But when BUSY (bit #7) is set, the rest are generally nonsense.
> 
>> And clearly, the soft-reset isn't doing squat.

I dunno.  My first suspect is transient transmission error and yeah they 
do occur from time to time even on otherwise stable setup.  For example, 
my machine is nvidia ck804 which has pretty weak error handling (at 
least used to) and stays up 24/7 and I've seen such unrecovered 
transmission error just once during last 6+ months.

My experience is that if something is weird (say, power fluctuation or 
electro-magnetic interference), SATA is the first thing to give out and 
that's why we need good EH w/ SATA much more than we do with PATA.

Drives (controllers too) sometimes fall into weird state after such 
errors and softreset is often not enough, so we need hardreset.  ICH8 
can do hardreset even in ata_piix mode.  I'll work on it.

Linus, I'll follow up with Jonas as his problem seems reproducible but 
I'm a bit skeptical about it being a driver issue.  Even w/ all its 
kinks, ata_piix is just a sff IDE controller and libata has been doing 
it for a long time.  I would be really surprised if the driver or 
controller has any such issue in the usual r/w path.  AHCI should be 
able to recover from most error conditions unless drive firmware is 
completely stuck requiring physical power off.

-- 
tejun

next prev parent reply	other threads:[~2006-11-29  1:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-14 15:04 [git patches] libata fixes Jeff Garzik
2006-11-14 16:32 ` Mark Lord
2006-11-14 16:41   ` Jeff Garzik
2006-11-14 18:11     ` Mark Lord
2006-11-28 16:56 ` Scary Intel SATA errors Linus Torvalds
2006-11-29 18:25   ` Mark Lord
2006-11-29 18:42     ` Alan
2006-12-01 19:42     ` Alan
2006-11-28 17:31 ` Scary Intel SATA problem: "frozen" Linus Torvalds
2006-11-28 17:37   ` Mark Lord
2006-11-28 17:55     ` Sergei Shtylyov
2006-11-28 20:12       ` Eric D. Mudama
2006-11-28 20:36         ` Sergei Shtylyov
2006-11-29  1:12     ` Tejun Heo [this message]
2006-11-28 18:05   ` Alan
2006-11-28 18:33     ` Linus Torvalds
2006-11-28 21:03   ` Jeff Garzik
2006-11-28 21:45     ` Linus Torvalds
2006-11-28 22:18   ` Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2006-11-28 22:24 Jonas Lundgren
2006-11-28 22:59 ` Linus Torvalds
2006-11-28 23:22   ` Jeff Garzik
2006-11-28 23:43     ` Linus Torvalds
2006-11-29  0:38       ` Jeff Garzik
2006-11-29  0:51         ` Linus Torvalds
2006-11-29  2:51       ` Mark Lord
2006-11-29  0:57 ` Tejun Heo
2006-11-29  7:14   ` Jonas Lundgren
2006-11-29  7:29     ` Tejun Heo
2006-11-29 14:11       ` Mark Lord
2006-11-29 16:19       ` Linus Torvalds
2006-12-06 17:58   ` Jonas Lundgren
2006-12-06 18:45     ` Andrew Lyon
2006-12-07  1:25     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=456CDE79.10709@gmail.com \
    --to=htejun@gmail.com \
    --cc=akpm@osdl.org \
    --cc=jeff@garzik.org \
    --cc=liml@rtr.ca \
    --cc=linux-ide@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).