From: Tejun Heo <htejun@gmail.com>
To: Robert Hancock <hancockr@shaw.ca>
Cc: Gabor Gombas <gombasg@sztaki.hu>,
linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org,
kluo@nvidia.com, AMartin@nvidia.com, pchen@nvidia.com
Subject: Re: sata_nv + ADMA + Samsung disk problem
Date: Wed, 02 Jan 2008 13:25:23 +0900 [thread overview]
Message-ID: <477B1233.3090706@gmail.com> (raw)
In-Reply-To: <477B10F2.9000107@shaw.ca>
Robert Hancock wrote:
>> This is kind of a longstanding problem which has been partially worked
>> around, but it seems not entirely. This is what I had diagnosed some
>> time ago:
>>
>> "recently, some issues cropped up with command timeouts when a cache
>> flush command was immediately followed by an NCQ write. In this case,
>> sometimes when the NCQ write was issued, the status register changed
>> from 0x500 (Stopped and Idle) to 0x400 (Stopped) as it normally
>> appears to, however it seems like the controller would get hung in
>> that state, and we would time out with no notifiers set, the gen_ctl
>> register not indicating interrupt status, and the CPB response flags
>> still 0 as we left them, seemingly indicating the controller hasn't
>> done anything with it. Then, when the error handler kicks in we clear
>> the GO bit to put it back into register mode, but the Legacy flag in
>> the status register doesn't get set (or at least it takes longer than
>> 1 microsecond). Finally when we do an ADMA channel reset that seems to
>> get it responding again, until this happens the next time.
>>
>> From some experimentation, I found that when we are issuing a NCQ
>> command when the last command was non-NCQ, or vice versa, if I added in
>> a delay of 20 microseconds between setting up the CPB and writing to the
>> append register, the problem appeared to go away. Problem is I don't
>> know if that's because it actually needs this delay, or because it
>> changes the timing so that it happens to work even though we're doing
>> something wrong, there's some event we're not waiting for, etc.
>>
>> I've now verified that no switches between ADMA and register mode
>> occur near the time of these timeouts. Neither are we reading or
>> writing any of the ATA shadow registers while we're in ADMA mode."
>>
>> It seems likely that this is what is happening here (a switch from an
>> NCQ command to a non-NCQ command, then the non-NCQ times out). It
>> could be in some cases the 20 microsecond delay is not enough. But it
>> seems bogus that we should need such an arbitrary delay in the first
>> place.
>>
>> The question I had for NVIDIA regarding this that I never got answered
>> was, is there any reason why we would need a delay when switching
>> between NCQ and non-NCQ commands on ADMA, and if not, is there any
>> known cause that could cause the controller to get into this seemingly
>> locked-up state?
>
> Well, I guess I did sort of get an answer, but the only theory was that
> the flush and the NCQ commands were being overlapped, which shouldn't be
> possible (the libata core guarantees that, and if it didn't work it
> would affect all controllers).
>
> I'm kind of wondering if there's something funny going on with the
> notifier register stuff, which is supposed to tell us what commands have
> completed. We don't really use it at all (we had some problems with
> missed completions, etc. when I tried using it, also it doesn't work if
> ATAPI is enabled on the other port on the controller, apparently). I
> know these controllers will do strange things like not signalling
> interrupts for later events if you don't clear the notifiers in just the
> right way (that being mostly determined by trial and error).
>
> Or, maybe somehow the flush is getting issued before the controller is
> really "ready" for it somehow (it's not finished cleaning up after
> preceding NCQ command).
>
> It's pretty hard for me to figure out which of the above might be the
> case, especially without access to the detailed controller documentation..
Thanks a lot for the detailed explanation. Nvidia ppl, any ideas?
FLUSH is used regularly. We really need to fix this.
--
tejun
next prev parent reply other threads:[~2008-01-02 4:25 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-08 12:08 sata_nv + ADMA + Samsung disk problem Gabor Gombas
2007-08-14 9:30 ` Tejun Heo
2007-08-14 12:02 ` Gabor Gombas
2007-08-16 16:06 ` Gabor Gombas
2007-08-16 18:45 ` Jim Paris
2008-01-01 16:44 ` Gabor Gombas
2008-01-02 3:25 ` Tejun Heo
2008-01-02 4:03 ` Robert Hancock
2008-01-02 4:20 ` Robert Hancock
2008-01-02 4:25 ` Tejun Heo [this message]
2008-01-02 6:19 ` Jeff Garzik
2008-01-02 6:39 ` Robert Hancock
2008-01-02 6:55 ` Tejun Heo
2008-01-03 0:27 ` Robert Hancock
2008-01-02 17:23 ` Allen Martin
2008-01-02 17:23 ` Allen Martin
2008-01-02 18:57 ` Jeff Garzik
2008-01-02 23:23 ` Allen Martin
2008-01-02 23:23 ` Allen Martin
2008-01-03 0:21 ` Robert Hancock
2008-01-03 4:14 ` Mark Lord
2008-01-03 4:17 ` Mark Lord
2008-01-03 4:54 ` Robert Hancock
2008-01-03 15:44 ` Mark Lord
2008-01-03 15:47 ` Mark Lord
2008-01-03 21:13 ` Benjamin Herrenschmidt
2008-01-04 1:43 ` Robert Hancock
2008-01-04 5:51 ` Benjamin Herrenschmidt
2008-01-04 0:41 ` Allen Martin
2008-01-04 0:41 ` Allen Martin
2008-01-04 2:51 ` Robert Hancock
2008-01-08 0:10 ` Robert Hancock
2008-01-11 23:18 ` Gabor Gombas
2008-01-12 1:10 ` Robert Hancock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=477B1233.3090706@gmail.com \
--to=htejun@gmail.com \
--cc=AMartin@nvidia.com \
--cc=gombasg@sztaki.hu \
--cc=hancockr@shaw.ca \
--cc=kluo@nvidia.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pchen@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.