All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Jeff Garzik <jeff@garzik.org>
Cc: Mark Lord <lkml@rtr.ca>, auxsvr@gmail.com, linux-kernel@vger.kernel.org
Subject: Re: ata command timeout
Date: Wed, 21 Feb 2007 16:03:44 +0900	[thread overview]
Message-ID: <45DBEED0.6050703@gmail.com> (raw)
In-Reply-To: <45DB15C2.7020206@garzik.org>

Jeff Garzik wrote:
> Mark Lord wrote:
>> I don't believe that.  Command timeouts never happen on healthy systems,
>> unless we have a driver bug.  Okay, so I can imagine a pathological case
>> of a full queue (NCQ) with all 32 commands taking longer than usual due
>> to ECC retries in the firmware..
> 
> It's not quite so black and white.  There have definitely been interrupt
> delivery problems that cause command timeouts.  Also, Intel PIIX BMDMA
> (all standard PCI IDE, I think?) is defined to /not/ send an interrupt,
> when a DMA error occurs.  The driver is instructed to time out the
> transaction, and start recovery by deducing the state of things from the
> DMA status bits.
> 
> Nonetheless, I mostly agree with your statement.  The two most common
> causes of timeouts that I see are interrupt delivery problems, and
> driver bugs.

Oh.. well.  My experience is that it's much more common on SATA compared
to PATA.  SATA link seems to be one of the most vulnerable parts to
interference.  When PSU has the slightest of problem, SATA drives
timeout or give transmission problems.  System often survives brief
fluctuation in power input (e.g. when the compressor starts up) but SATA
link sometimes reports error after such event.

Or just buy a static generator and apply it to your computer case.
Generally system is perfectly okay with that but the SATA devices tend
to complain or timeout.

Those condition might not be considered too healthy in any server
environment but they do occur on cheap desktop environment.  I mean, a
lot of people are putting 10USD PSU into their desktop machines.

So, yeah, it might be a driver or other problem but if problem is very
intermittent, I tend to lean toward transient hardware problem and
that's primarily why I wanna make EH kick in and recover faster.

Thanks.

-- 
tejun

      reply	other threads:[~2007-02-21  7:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-19 19:19 ata command timeout auxsvr
2007-02-20  4:07 ` Tejun Heo
2007-02-20  4:37   ` Marc Marais
2007-02-20 15:18   ` Mark Lord
2007-02-20 15:37     ` Jeff Garzik
2007-02-21  7:03       ` Tejun Heo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45DBEED0.6050703@gmail.com \
    --to=htejun@gmail.com \
    --cc=auxsvr@gmail.com \
    --cc=jeff@garzik.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@rtr.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.