linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Robert Hancock <hancockr@shaw.ca>
Cc: Gabor Gombas <gombasg@sztaki.hu>,
	linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org,
	kluo@nvidia.com, AMartin@nvidia.com, pchen@nvidia.com
Subject: Re: sata_nv + ADMA + Samsung disk problem
Date: Wed, 02 Jan 2008 13:25:23 +0900	[thread overview]
Message-ID: <477B1233.3090706@gmail.com> (raw)
In-Reply-To: <477B10F2.9000107@shaw.ca>

Robert Hancock wrote:
>> This is kind of a longstanding problem which has been partially worked
>> around, but it seems not entirely. This is what I had diagnosed some
>> time ago:
>>
>> "recently, some issues cropped up with command timeouts when a cache
>> flush command was immediately followed by an NCQ write. In this case,
>> sometimes when the NCQ write was issued, the status register changed
>> from 0x500 (Stopped and Idle) to 0x400 (Stopped) as it normally
>> appears to, however it seems like the controller would get hung in
>> that state, and we would time out with no notifiers set, the gen_ctl
>> register not indicating interrupt status, and the CPB response flags
>> still 0 as we left them, seemingly indicating the controller hasn't
>> done anything with it. Then, when the error handler kicks in we clear
>> the GO bit to put it back into register mode, but the Legacy flag in
>> the status register doesn't get set (or at least it takes longer than
>> 1 microsecond). Finally when we do an ADMA channel reset that seems to
>> get it responding again, until this happens the next time.
>>
>>  From some experimentation, I found that when we are issuing a NCQ
>> command when the last command was non-NCQ, or vice versa, if I added in
>> a delay of 20 microseconds between setting up the CPB and writing to the
>> append register, the problem appeared to go away. Problem is I don't
>> know if that's because it actually needs this delay, or because it
>> changes the timing so that it happens to work even though we're doing
>> something wrong, there's some event we're not waiting for, etc.
>>
>> I've now verified that no switches between ADMA and register mode
>> occur near the time of these timeouts. Neither are we reading or
>> writing any of the ATA shadow registers while we're in ADMA mode."
>>
>> It seems likely that this is what is happening here (a switch from an
>> NCQ command to a non-NCQ command, then the non-NCQ times out). It
>> could be in some cases the 20 microsecond delay is not enough. But it
>> seems bogus that we should need such an arbitrary delay in the first
>> place.
>>
>> The question I had for NVIDIA regarding this that I never got answered
>> was, is there any reason why we would need a delay when switching
>> between NCQ and non-NCQ commands on ADMA, and if not, is there any
>> known cause that could cause the controller to get into this seemingly
>> locked-up state?
> 
> Well, I guess I did sort of get an answer, but the only theory was that
> the flush and the NCQ commands were being overlapped, which shouldn't be
> possible (the libata core guarantees that, and if it didn't work it
> would affect all controllers).
> 
> I'm kind of wondering if there's something funny going on with the
> notifier register stuff, which is supposed to tell us what commands have
> completed. We don't really use it at all (we had some problems with
> missed completions, etc. when I tried using it, also it doesn't work if
> ATAPI is enabled on the other port on the controller, apparently). I
> know these controllers will do strange things like not signalling
> interrupts for later events if you don't clear the notifiers in just the
> right way (that being mostly determined by trial and error).
> 
> Or, maybe somehow the flush is getting issued before the controller is
> really "ready" for it somehow (it's not finished cleaning up after
> preceding NCQ command).
> 
> It's pretty hard for me to figure out which of the above might be the
> case, especially without access to the detailed controller documentation..

Thanks a lot for the detailed explanation.  Nvidia ppl, any ideas?
FLUSH is used regularly.  We really need to fix this.

-- 
tejun

  reply	other threads:[~2008-01-02  4:25 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-08 12:08 sata_nv + ADMA + Samsung disk problem Gabor Gombas
2007-08-14  9:30 ` Tejun Heo
2007-08-14 12:02   ` Gabor Gombas
2007-08-16 16:06   ` Gabor Gombas
2007-08-16 18:45     ` Jim Paris
2008-01-01 16:44 ` Gabor Gombas
2008-01-02  3:25   ` Tejun Heo
2008-01-02  4:03     ` Robert Hancock
2008-01-02  4:20       ` Robert Hancock
2008-01-02  4:25         ` Tejun Heo [this message]
2008-01-02  6:19           ` Jeff Garzik
2008-01-02  6:39             ` Robert Hancock
2008-01-02  6:55               ` Tejun Heo
2008-01-03  0:27                 ` Robert Hancock
2008-01-02 17:23       ` Allen Martin
2008-01-02 18:57         ` Jeff Garzik
2008-01-02 23:23           ` Allen Martin
2008-01-03  0:21             ` Robert Hancock
2008-01-03  4:14               ` Mark Lord
2008-01-03  4:17               ` Mark Lord
2008-01-03  4:54                 ` Robert Hancock
2008-01-03 15:44                   ` Mark Lord
2008-01-03 15:47                     ` Mark Lord
2008-01-03 21:13                       ` Benjamin Herrenschmidt
2008-01-04  1:43                         ` Robert Hancock
2008-01-04  5:51                           ` Benjamin Herrenschmidt
2008-01-04  0:41                   ` Allen Martin
2008-01-04  2:51                     ` Robert Hancock
2008-01-08  0:10                     ` Robert Hancock
2008-01-11 23:18                       ` Gabor Gombas
2008-01-12  1:10                         ` Robert Hancock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=477B1233.3090706@gmail.com \
    --to=htejun@gmail.com \
    --cc=AMartin@nvidia.com \
    --cc=gombasg@sztaki.hu \
    --cc=hancockr@shaw.ca \
    --cc=kluo@nvidia.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pchen@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).