From: Tejun Heo <htejun@gmail.com>
To: Robert Hancock <hancockr@shaw.ca>
Cc: Gabor Gombas <gombasg@sztaki.hu>,
linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org,
kluo@nvidia.com, AMartin@nvidia.com, pchen@nvidia.com
Subject: Re: sata_nv + ADMA + Samsung disk problem
Date: Wed, 02 Jan 2008 13:25:23 +0900 [thread overview]
Message-ID: <477B1233.3090706@gmail.com> (raw)
In-Reply-To: <477B10F2.9000107@shaw.ca>
Robert Hancock wrote:
>> This is kind of a longstanding problem which has been partially worked
>> around, but it seems not entirely. This is what I had diagnosed some
>> time ago:
>>
>> "recently, some issues cropped up with command timeouts when a cache
>> flush command was immediately followed by an NCQ write. In this case,
>> sometimes when the NCQ write was issued, the status register changed
>> from 0x500 (Stopped and Idle) to 0x400 (Stopped) as it normally
>> appears to, however it seems like the controller would get hung in
>> that state, and we would time out with no notifiers set, the gen_ctl
>> register not indicating interrupt status, and the CPB response flags
>> still 0 as we left them, seemingly indicating the controller hasn't
>> done anything with it. Then, when the error handler kicks in we clear
>> the GO bit to put it back into register mode, but the Legacy flag in
>> the status register doesn't get set (or at least it takes longer than
>> 1 microsecond). Finally when we do an ADMA channel reset that seems to
>> get it responding again, until this happens the next time.
>>
>> From some experimentation, I found that when we are issuing a NCQ
>> command when the last command was non-NCQ, or vice versa, if I added in
>> a delay of 20 microseconds between setting up the CPB and writing to the
>> append register, the problem appeared to go away. Problem is I don't
>> know if that's because it actually needs this delay, or because it
>> changes the timing so that it happens to work even though we're doing
>> something wrong, there's some event we're not waiting for, etc.
>>
>> I've now verified that no switches between ADMA and register mode
>> occur near the time of these timeouts. Neither are we reading or
>> writing any of the ATA shadow registers while we're in ADMA mode."
>>
>> It seems likely that this is what is happening here (a switch from an
>> NCQ command to a non-NCQ command, then the non-NCQ times out). It
>> could be in some cases the 20 microsecond delay is not enough. But it
>> seems bogus that we should need such an arbitrary delay in the first
>> place.
>>
>> The question I had for NVIDIA regarding this that I never got answered
>> was, is there any reason why we would need a delay when switching
>> between NCQ and non-NCQ commands on ADMA, and if not, is there any
>> known cause that could cause the controller to get into this seemingly
>> locked-up state?
>
> Well, I guess I did sort of get an answer, but the only theory was that
> the flush and the NCQ commands were being overlapped, which shouldn't be
> possible (the libata core guarantees that, and if it didn't work it
> would affect all controllers).
>
> I'm kind of wondering if there's something funny going on with the
> notifier register stuff, which is supposed to tell us what commands have
> completed. We don't really use it at all (we had some problems with
> missed completions, etc. when I tried using it, also it doesn't work if
> ATAPI is enabled on the other port on the controller, apparently). I
> know these controllers will do strange things like not signalling
> interrupts for later events if you don't clear the notifiers in just the
> right way (that being mostly determined by trial and error).
>
> Or, maybe somehow the flush is getting issued before the controller is
> really "ready" for it somehow (it's not finished cleaning up after
> preceding NCQ command).
>
> It's pretty hard for me to figure out which of the above might be the
> case, especially without access to the detailed controller documentation..
Thanks a lot for the detailed explanation. Nvidia ppl, any ideas?
FLUSH is used regularly. We really need to fix this.
--
tejun
next prev parent reply other threads:[~2008-01-02 4:25 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-08 12:08 sata_nv + ADMA + Samsung disk problem Gabor Gombas
2007-08-14 9:30 ` Tejun Heo
2007-08-14 12:02 ` Gabor Gombas
2007-08-16 16:06 ` Gabor Gombas
2007-08-16 18:45 ` Jim Paris
2008-01-01 16:44 ` Gabor Gombas
2008-01-02 3:25 ` Tejun Heo
2008-01-02 4:03 ` Robert Hancock
2008-01-02 4:20 ` Robert Hancock
2008-01-02 4:25 ` Tejun Heo [this message]
2008-01-02 6:19 ` Jeff Garzik
2008-01-02 6:39 ` Robert Hancock
2008-01-02 6:55 ` Tejun Heo
2008-01-03 0:27 ` Robert Hancock
2008-01-02 17:23 ` Allen Martin
2008-01-02 18:57 ` Jeff Garzik
2008-01-02 23:23 ` Allen Martin
2008-01-03 0:21 ` Robert Hancock
2008-01-03 4:14 ` Mark Lord
2008-01-03 4:17 ` Mark Lord
2008-01-03 4:54 ` Robert Hancock
2008-01-03 15:44 ` Mark Lord
2008-01-03 15:47 ` Mark Lord
2008-01-03 21:13 ` Benjamin Herrenschmidt
2008-01-04 1:43 ` Robert Hancock
2008-01-04 5:51 ` Benjamin Herrenschmidt
2008-01-04 0:41 ` Allen Martin
2008-01-04 2:51 ` Robert Hancock
2008-01-08 0:10 ` Robert Hancock
2008-01-11 23:18 ` Gabor Gombas
2008-01-12 1:10 ` Robert Hancock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=477B1233.3090706@gmail.com \
--to=htejun@gmail.com \
--cc=AMartin@nvidia.com \
--cc=gombasg@sztaki.hu \
--cc=hancockr@shaw.ca \
--cc=kluo@nvidia.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pchen@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).